API Reference

Search API

An interface to the ESGF Search API

Module pyesgf.search.connection

Defines the class representing connections to the ESGF Search API. To perform a search create a SearchConnection instance then use new_context() to create a search context.

Warning

Prior to v0.1.1 the url parameter expected the full URL of the search endpoint up to the query string. This has now been changed to expect url to ommit the final endpoint name, e.g. https://esgf-node.llnl.gov/esg-search/search should be changed to https://esgf-node.llnl.gov/esg-search in client code. The current implementation detects the presence of /search and corrects the URL to retain backward compatibility but this feature may not remain in future versions.

class pyesgf.search.connection.SearchConnection(url, distrib=True, cache=None, timeout=120, expire_after=datetime.timedelta(seconds=3600), session=None, verify=True, context_class=None)[source]
Variables
  • url – The URL to the Search API service. This should be the URL of the ESGF search service excluding the final endpoint name. Usually this is http://<hostname>/esg-search

  • distrib – Boolean stating whether searches through this connection are distributed. i.e. whether the Search service distributes the query to other search peers. See also the documentation for the facets argument to pyesgf.search.context.SearchContext in relation to distributed searches.

  • cache – Path to sqlite cache file. Cache expires every hours.

  • timeout – Time (in seconds) before query returns an error. Default: 120s.

  • expire_after – Time delta after cache expires. Default: 1 hour.

  • session – requests.session object. optional.

  • verify – boolean, determines if query should be sent over a verified channel.

get_shard_list()[source]

return the list of all available shards. A subset of the returned list can be supplied to ‘send_query()’ to limit the query to selected shards.

Shards are described by hostname and mapped to SOLr shard descriptions internally.

Returns

the list of available shards

new_context(context_class=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None, search_type=None, **constraints)[source]

Returns a pyesgf.search.context.SearchContext class for performing faceted searches.

See SearchContext.__init__() for documentation on the arguments.

Send a query to the “search” endpoint. See send_query() for details.

Returns

The json document for the search results

send_wget(query_dict, shards=None)[source]

Send a query to the “search” endpoint. See send_query() for details.

Returns

A string containing the script.

pyesgf.search.connection.create_single_session(cache=None, expire_after=datetime.timedelta(seconds=3600), **kwargs)[source]

Simple helper function to start a requests or requests_cache session.

cache, if specified is a filename to a threadsafe sqlite database expire_after specifies how long the cache should be kept

pyesgf.search.connection.query_keyword_type(keyword)[source]

Returns the keyword type of a search query keyword.

Possible values are ‘system’, ‘freetext’, ‘facet’, ‘temporal’ and ‘geospatial’. If the keyword is unknown it is assumed to be a facet keyword

Module pyesgf.search.context

Defines the SearchContext class which represents each ESGF search query.

class pyesgf.search.context.AggregationSearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]
class pyesgf.search.context.DatasetSearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]
class pyesgf.search.context.FileSearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]
class pyesgf.search.context.SearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]

Instances of this class represent the state of a current search. It exposes what facets are available to select and the facet counts if they are available.

Subclasses of this class can restrict the search options. For instance FileSearchContext, DatasetSerachContext or CMIP5SearchContext

SearchContext instances are connected to SearchConnection instances. You normally create SearchContext instances via one of: 1. Calling SearchConnection.new_context() 2. Calling SearchContext.constrain()

Variables
  • constraints – A dictionary of facet constraints currently in effect. constraint[facet_name] = [value, value, ...]

  • facets – A string containing a comma-separated list of facets to be returned (for example 'source_id,ensemble_id'). If set, this will be used to select which facet counts to include, as returned in the facet_counts dictionary. Defaults to including all available facets, but with distributed searches (where the SearchConnection instance was created with distrib=True), some results may be missing for server-side reasons when requesting all facets, so a warning message will be issued. This contains further details.

Property facet_counts

A dictionary of available hits with each facet value for the search as currently constrained. This property returns a dictionary of dictionaries where facet_counts[facet][facet_value] == hit_count

Property hit_count

The total number of hits available with current constraints.

constrain(**constraints)[source]

Return a new instance with the additional constraints.

get_download_script(**constraints)[source]

Download a script for downloading all files in the set of results.

Parameters

constraints – Further constraints for this query. Equivalent to calling self.constrain(**constraints).get_download_script()

Returns

A string containing the script

get_facet_options()[source]

Return a dictionary of facet counts filtered to remove all facets that are completely constrained. This method is similar to the property facet_counts except facet values which are not relevant for further constraining are removed.

search(batch_size=50, ignore_facet_check=False, **constraints)[source]

Perform the search with current constraints returning a set of results.

Batch_size

The number of results to get per HTTP request.

Ignore_facet_check

Do not make an extra HTTP request to populate facet_counts and hit_count.

Parameters

constraints – Further constraints for this query. Equivalent to calling self.constrain(**constraints).search()

Returns

A ResultSet for this query

Module pyesgf.search.results

Search results are retrieved through the ResultSet class. This class hides paging of large result sets behind a client-side cache. Subclasses of Result represent results of different SOLr record type.

class pyesgf.search.results.AggregationResult(json, context)[source]
A result object for ESGF aggregations. Properties from BaseResult

are inherited.

Property aggregation_id

The aggregation id

class pyesgf.search.results.BaseResult(json, context)[source]

Base class for results.

Subclasses represent different search types such as File and Dataset.

Variables
  • json – The original json representation of the result.

  • context – The SearchContext which generated this result.

Property urls

a dictionary of the form {service: [(url, mime_type), ...], ...}

Property opendap_url

The url of an OPeNDAP endpoint for this result if available

Property las_url

The url of an LAS endpoint for this result if available

Property download_url

The url for downloading the result by HTTP if available

Property gridftp_url

The url for downloading the result by Globus if available

Property globus_url

The url for downloading the result by Globus if available (including endpoint)

Property index_node

The index node from where the metadata is stored. Calls to *_context() will optimise queries to only address this node.

class pyesgf.search.results.DatasetResult(json, context)[source]

A result object for ESGF datasets.

Property dataset_id

The solr dataset_id which is unique throughout the system.

aggregation_context()[source]

Return a SearchContext for searching for aggregations within this dataset.

file_context()[source]

Return a SearchContext for searching for files within this dataset.

property number_of_files

Returns file count as reported by the dataset record.

class pyesgf.search.results.FileResult(json, context)[source]
A result object for ESGF files. Properties from BaseResult are

inherited.

Property file_id

The identifier for the file

Property checksum

The checksum of the file

Property checksum_type

The algorithm used for generating the checksum

Property filename

The filename

Property size

The file size in bytes

class pyesgf.search.results.ResultSet(context, batch_size=50, eager=True)[source]
Variables

context – The search context object used to generate this resultset

Property batch_size

The number of results that will be requested from esgf-search as one call. This must be set on creation and cannot change.

ESGF Security API

pyesgf provides a simplified interface to obtaining ESGF credentials.

Module pyesgf.logon

Manage the client’s interaction with ESGF’s security system. Using this module requires installing the MyProxyClient library.

To obtain ESGF credentials create a LogonManager instance and supply it with logon details:

>>> lm = LogonManager()
>>> lm.is_logged_on()
False
>>> lm.logon(username, password, myproxy_hostname, bootstrap=True)
>>> lm.is_logged_on()
True

Logon parameters that aren’t specified will be prompted for at the terminal by default. The LogonManager object also writes a .httprc file configuring OPeNDAP access through the NetCDF API.

The option bootstrap=True is needed on the first run.

You can use your OpenID to logon instead. The logon details will be deduced from the OpenID where possible:

>>> lm.logoff()
>>> lm.is_logged_on()
False
>>> lm.logon_with_openid(openid, password, bootstrap=True)
>>> lm.is_logged_on()
True
class pyesgf.logon.LogonManager(esgf_dir='/home/docs/.esg', dap_config='/home/docs/.dodsrc', verify=True)[source]

Manages ESGF crendentials and security configuration files.

Also integrates with NetCDF’s secure OPeNDAP configuration.

logoff(clear_trustroots=False)[source]

Remove any obtained credentials from the ESGF environment.

Parameters

clear_trustroots – If True also remove trustroots.

logon(username=None, password=None, hostname=None, bootstrap=False, update_trustroots=True, interactive=True)[source]

Obtain ESGF credentials from the specified MyProxy service.

If interactive == True then any missing parameters of password, username or hostname will be prompted for at the terminal.

Parameters
  • interactive – Whether to ask for input at the terminal for any missing information. I.e. username, password or hostname.

  • bootstrap – Whether to bootstrap the trustroots for this MyProxy service.

  • update_trustroots – Whether to update the trustroots for this MyProxy service.

logon_with_openid(openid, password=None, bootstrap=False, update_trustroots=True, interactive=True)[source]

Obtains ESGF credentials by detecting the MyProxy parameters from the users OpenID. Some ESGF compatible OpenIDs do not contain enough information to obtain credentials. In this case the user is prompted for missing information if interactive == True, otherwise an exception is raised.

Parameters

openid – OpenID to login with See logon() for parameters interactive, bootstrap and update_trustroots.