Design Concepts

Search Concepts

The pyesgf.search interface to ESGF search reflects the typical workflow of a user navigating through the sets of facets categorising available data.

Keyword classification

The keyword arguments described in the ESGF Search API have a wide veriety of roles within the search workflow. To reflect this pyesgf.search classifies these keywords into system, spatiotemporal and facet keywords. Responsibility for these keywords are distributes across several classes.

System keywords

API keyword

class

Notes

limit

SearchConnection

Set in SearchConnection:send_query() method or transparently through SearchContext

offset

SearchConnection

Set in SearchConnection:send_query() method or transparently through SearchContext

shards

SearchConnection

Set in constructor

distrib

SearchConnection

Set in constructor

latest

SearchContext

Set in constructor

facets

SearchContext

Set in constructor

fields

SearchContext

Set in constructor

replica

SearchContext

Set in constructor

type

SearchContext

Create contexts with the right type using ResultSet.file_context(), etc.

from

SearchContext

Set in constructor. Use “from_timestamp” in the context API.

to

SearchContext

Set in constructor. Use “to_timestamp” in the context API.

fields

n/a

Managed internally

format

n/a

Managed internally

id

n/a

Managed internally

Temporal keywords

Temporal keywords are supported for Dataset search. The terms “from_timestamp” and “to_timestamp” should be used with values following the format “YYYY-MM-DDThh:mm:ssZ”.

Spatial keywords

Spatial keywords are not yet supported by pyesgf.search however the API does have placeholders for these keywords anticipating future implementation:

Facet keywords

All other keywords are considered to be search facets. The keyword “query” is dealt with specially as a freetext facet.

Main Classes

SearchConnection

SearchConnection instances represent a connection to an ESGF Search web service. This stores the service URL and also service-level parameters like distrib and shards.

SearchContext

SearchContext represents the constraints on a given search. This includes the type of records you are searching for (File or Dataset), the list of possible facets with or without facet counts (depending on how the instance is created), currently selected facets/search-terms. Instances can return the number of hits and facet-counts associated with the current search.

SearchContext objects can be created in several ways:

  1. From a SearchConnection object using the method SearchConnection.new_context()

  2. By further constraining an existing FacetContext object. E.g. new_context = context.constrain(institute=’IPSL’).

  3. From a Result object using one of it’s foo_context() methods to create a context for searching for results related to the Result.

  4. Future development may implement project-specific factory. E.g. CMIP5FacetContext().

ResultSet

ResultSet instances are returned by the SearchContext.search() method and represent the results from a query. They supports transparent paging of results with a client-side cache.

Result

Result instances represent the result record in the SOLr response. They are subclassed to represent records of different types: FileResult and DatasetResult. Results have various properties exposing information about the objects they represent. e.g. dataset_id, checksum, filename, size, etc.