API Reference¶
Search API¶
An interface to the ESGF Search API
Module pyesgf.search.connection¶
Defines the class representing connections to the ESGF Search API. To
perform a search create a SearchConnection instance then use
new_context() to create a search context.
Warning
Prior to v0.1.1 the url parameter expected the full URL of the
search endpoint up to the query string. This has now been changed
to expect url to ommit the final endpoint name,
e.g. https://esgf-node.llnl.gov/esg-search/search should be changed
to https://esgf-node.llnl.gov/esg-search in client code. The
current implementation detects the presence of /search and
corrects the URL to retain backward compatibility but this feature
may not remain in future versions.
- class pyesgf.search.connection.SearchConnection(url, distrib=True, cache=None, timeout=120, expire_after=datetime.timedelta(seconds=3600), session=None, verify=True, context_class=None)[source]¶
- Variables:
url – The URL to the Search API service. This should be the URL of the ESGF search service excluding the final endpoint name. Usually this is http://<hostname>/esg-search
distrib – Boolean stating whether searches through this connection are distributed. i.e. whether the Search service distributes the query to other search peers. See also the documentation for the
facetsargument topyesgf.search.context.SearchContextin relation to distributed searches.cache – Path to sqlite cache file. Cache expires every hours.
timeout – Time (in seconds) before query returns an error. Default: 120s.
expire_after – Time delta after cache expires. Default: 1 hour.
session – requests.session object. optional.
verify – boolean, determines if query should be sent over a verified channel.
- get_shard_list()[source]¶
return the list of all available shards. A subset of the returned list can be supplied to ‘send_query()’ to limit the query to selected shards.
Shards are described by hostname and mapped to SOLr shard descriptions internally.
- Returns:
the list of available shards
- new_context(context_class=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None, search_type=None, **constraints)[source]¶
Returns a
pyesgf.search.context.SearchContextclass for performing faceted searches.See
SearchContext.__init__()for documentation on the arguments.
- pyesgf.search.connection.create_single_session(cache=None, expire_after=datetime.timedelta(seconds=3600), **kwargs)[source]¶
Simple helper function to start a requests or requests_cache session.
cache, if specified is a filename to a threadsafe sqlite database expire_after specifies how long the cache should be kept
- pyesgf.search.connection.query_keyword_type(keyword)[source]¶
Returns the keyword type of a search query keyword.
Possible values are ‘system’, ‘freetext’, ‘facet’, ‘temporal’ and ‘geospatial’. If the keyword is unknown it is assumed to be a facet keyword
Module pyesgf.search.context¶
Defines the SearchContext class which represents each ESGF search
query.
- class pyesgf.search.context.AggregationSearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]¶
- class pyesgf.search.context.DatasetSearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]¶
- class pyesgf.search.context.FileSearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]¶
- class pyesgf.search.context.SearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]¶
Instances of this class represent the state of a current search. It exposes what facets are available to select and the facet counts if they are available.
Subclasses of this class can restrict the search options. For instance FileSearchContext, DatasetSerachContext or CMIP5SearchContext
SearchContext instances are connected to SearchConnection instances. You normally create SearchContext instances via one of: 1. Calling SearchConnection.new_context() 2. Calling SearchContext.constrain()
- Variables:
constraints – A dictionary of facet constraints currently in effect.
constraint[facet_name] = [value, value, ...]facets – A string containing a comma-separated list of facets to be returned (for example
'source_id,ensemble_id'). If set, this will be used to select which facet counts to include, as returned in thefacet_countsdictionary. Defaults to including all available facets, but with distributed searches (where the SearchConnection instance was created withdistrib=True), some results may be missing for server-side reasons when requesting all facets, so a warning message will be issued. This contains further details.
- Property facet_counts:
A dictionary of available hits with each facet value for the search as currently constrained. This property returns a dictionary of dictionaries where
facet_counts[facet][facet_value] == hit_count- Property hit_count:
The total number of hits available with current constraints.
- get_download_script(**constraints)[source]¶
Download a script for downloading all files in the set of results.
- Parameters:
constraints – Further constraints for this query. Equivalent to calling
self.constrain(**constraints).get_download_script()- Returns:
A string containing the script
- get_facet_options()[source]¶
Return a dictionary of facet counts filtered to remove all facets that are completely constrained. This method is similar to the property
facet_countsexcept facet values which are not relevant for further constraining are removed.
- search(batch_size=50, ignore_facet_check=False, **constraints)[source]¶
Perform the search with current constraints returning a set of results.
- Batch_size:
The number of results to get per HTTP request.
- Ignore_facet_check:
Do not make an extra HTTP request to populate
facet_countsandhit_count.- Parameters:
constraints – Further constraints for this query. Equivalent to calling
self.constrain(**constraints).search()- Returns:
A ResultSet for this query
Module pyesgf.search.results¶
Search results are retrieved through the ResultSet class. This class
hides paging of large result sets behind a client-side cache. Subclasses of
Result represent results of different SOLr record type.
- class pyesgf.search.results.AggregationResult(json, context)[source]¶
- A result object for ESGF aggregations. Properties from
BaseResult are inherited.
- Property aggregation_id:
The aggregation id
- A result object for ESGF aggregations. Properties from
- class pyesgf.search.results.BaseResult(json, context)[source]¶
Base class for results.
Subclasses represent different search types such as File and Dataset.
- Variables:
json – The original json representation of the result.
context – The SearchContext which generated this result.
- Property urls:
a dictionary of the form
{service: [(url, mime_type), ...], ...}- Property opendap_url:
The url of an OPeNDAP endpoint for this result if available
- Property las_url:
The url of an LAS endpoint for this result if available
- Property download_url:
The url for downloading the result by HTTP if available
- Property gridftp_url:
The url for downloading the result by Globus if available
- Property globus_url:
The url for downloading the result by Globus if available (including endpoint)
- Property index_node:
The index node from where the metadata is stored. Calls to
*_context()will optimise queries to only address this node.
- class pyesgf.search.results.DatasetResult(json, context)[source]¶
A result object for ESGF datasets.
- Property dataset_id:
The solr dataset_id which is unique throughout the system.
- aggregation_context()[source]¶
Return a SearchContext for searching for aggregations within this dataset.
- property number_of_files¶
Returns file count as reported by the dataset record.
- class pyesgf.search.results.FileResult(json, context)[source]¶
- A result object for ESGF files. Properties from
BaseResultare inherited.
- Property file_id:
The identifier for the file
- Property checksum:
The checksum of the file
- Property checksum_type:
The algorithm used for generating the checksum
- Property filename:
The filename
- Property size:
The file size in bytes
- A result object for ESGF files. Properties from
- class pyesgf.search.results.ResultSet(context, batch_size=50, eager=True)[source]¶
- Variables:
context – The search context object used to generate this resultset
- Property batch_size:
The number of results that will be requested from esgf-search as one call. This must be set on creation and cannot change.
ESGF Security API¶
pyesgf provides a simplified interface to obtaining ESGF credentials.
Module pyesgf.logon¶
Manage the client’s interaction with ESGF’s security system. Using this module requires installing the MyProxyClient library.
To obtain ESGF credentials create a LogonManager instance and supply
it with logon details:
>>> lm = LogonManager()
>>> lm.is_logged_on()
False
>>> lm.logon(username, password, myproxy_hostname, bootstrap=True)
>>> lm.is_logged_on()
True
Logon parameters that aren’t specified will be prompted for at the terminal
by default. The LogonManager object also writes a .httprc file
configuring OPeNDAP access through the NetCDF API.
The option bootstrap=True is needed on the first run.
You can use your OpenID to logon instead. The logon details will be deduced from the OpenID where possible:
>>> lm.logoff()
>>> lm.is_logged_on()
False
>>> lm.logon_with_openid(openid, password, bootstrap=True)
>>> lm.is_logged_on()
True
- class pyesgf.logon.LogonManager(esgf_dir='/home/docs/.esg', dap_config='/home/docs/.dodsrc', verify=True)[source]¶
Manages ESGF crendentials and security configuration files.
Also integrates with NetCDF’s secure OPeNDAP configuration.
- logoff(clear_trustroots=False)[source]¶
Remove any obtained credentials from the ESGF environment.
- Parameters:
clear_trustroots – If True also remove trustroots.
- logon(username=None, password=None, hostname=None, bootstrap=False, update_trustroots=True, interactive=True)[source]¶
Obtain ESGF credentials from the specified MyProxy service.
If
interactive == Truethen any missing parameters ofpassword,usernameorhostnamewill be prompted for at the terminal.- Parameters:
interactive – Whether to ask for input at the terminal for any missing information. I.e. username, password or hostname.
bootstrap – Whether to bootstrap the trustroots for this MyProxy service.
update_trustroots – Whether to update the trustroots for this MyProxy service.
- logon_with_openid(openid, password=None, bootstrap=False, update_trustroots=True, interactive=True)[source]¶
Obtains ESGF credentials by detecting the MyProxy parameters from the users OpenID. Some ESGF compatible OpenIDs do not contain enough information to obtain credentials. In this case the user is prompted for missing information if
interactive == True, otherwise an exception is raised.- Parameters:
openid – OpenID to login with See
logon()for parametersinteractive,bootstrapandupdate_trustroots.