API Reference¶
Search API¶
An interface to the ESGF Search API
Module pyesgf.search.connection
¶
Defines the class representing connections to the ESGF Search API. To
perform a search create a SearchConnection
instance then use
new_context()
to create a search context.
Warning
Prior to v0.1.1 the url parameter expected the full URL of the
search endpoint up to the query string. This has now been changed
to expect url to ommit the final endpoint name,
e.g. https://esgf-node.llnl.gov/esg-search/search
should be changed
to https://esgf-node.llnl.gov/esg-search
in client code. The
current implementation detects the presence of /search
and
corrects the URL to retain backward compatibility but this feature
may not remain in future versions.
- class pyesgf.search.connection.SearchConnection(url, distrib=True, cache=None, timeout=120, expire_after=datetime.timedelta(seconds=3600), session=None, verify=True, context_class=None)[source]¶
- Variables
url – The URL to the Search API service. This should be the URL of the ESGF search service excluding the final endpoint name. Usually this is http://<hostname>/esg-search
distrib – Boolean stating whether searches through this connection are distributed. i.e. whether the Search service distributes the query to other search peers. See also the documentation for the
facets
argument topyesgf.search.context.SearchContext
in relation to distributed searches.cache – Path to sqlite cache file. Cache expires every hours.
timeout – Time (in seconds) before query returns an error. Default: 120s.
expire_after – Time delta after cache expires. Default: 1 hour.
session – requests.session object. optional.
verify – boolean, determines if query should be sent over a verified channel.
- get_shard_list()[source]¶
return the list of all available shards. A subset of the returned list can be supplied to ‘send_query()’ to limit the query to selected shards.
Shards are described by hostname and mapped to SOLr shard descriptions internally.
- Returns
the list of available shards
- new_context(context_class=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None, search_type=None, **constraints)[source]¶
Returns a
pyesgf.search.context.SearchContext
class for performing faceted searches.See
SearchContext.__init__()
for documentation on the arguments.
- pyesgf.search.connection.create_single_session(cache=None, expire_after=datetime.timedelta(seconds=3600), **kwargs)[source]¶
Simple helper function to start a requests or requests_cache session.
cache, if specified is a filename to a threadsafe sqlite database expire_after specifies how long the cache should be kept
- pyesgf.search.connection.query_keyword_type(keyword)[source]¶
Returns the keyword type of a search query keyword.
Possible values are ‘system’, ‘freetext’, ‘facet’, ‘temporal’ and ‘geospatial’. If the keyword is unknown it is assumed to be a facet keyword
Module pyesgf.search.context
¶
Defines the SearchContext
class which represents each ESGF search
query.
- class pyesgf.search.context.AggregationSearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]¶
- class pyesgf.search.context.DatasetSearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]¶
- class pyesgf.search.context.FileSearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]¶
- class pyesgf.search.context.SearchContext(connection, constraints, search_type=None, latest=None, facets=None, fields=None, from_timestamp=None, to_timestamp=None, replica=None, shards=None)[source]¶
Instances of this class represent the state of a current search. It exposes what facets are available to select and the facet counts if they are available.
Subclasses of this class can restrict the search options. For instance FileSearchContext, DatasetSerachContext or CMIP5SearchContext
SearchContext instances are connected to SearchConnection instances. You normally create SearchContext instances via one of: 1. Calling SearchConnection.new_context() 2. Calling SearchContext.constrain()
- Variables
constraints – A dictionary of facet constraints currently in effect.
constraint[facet_name] = [value, value, ...]
facets – A string containing a comma-separated list of facets to be returned (for example
'source_id,ensemble_id'
). If set, this will be used to select which facet counts to include, as returned in thefacet_counts
dictionary. Defaults to including all available facets, but with distributed searches (where the SearchConnection instance was created withdistrib=True
), some results may be missing for server-side reasons when requesting all facets, so a warning message will be issued. This contains further details.
- Property facet_counts
A dictionary of available hits with each facet value for the search as currently constrained. This property returns a dictionary of dictionaries where
facet_counts[facet][facet_value] == hit_count
- Property hit_count
The total number of hits available with current constraints.
- get_download_script(**constraints)[source]¶
Download a script for downloading all files in the set of results.
- Parameters
constraints – Further constraints for this query. Equivalent to calling
self.constrain(**constraints).get_download_script()
- Returns
A string containing the script
- get_facet_options()[source]¶
Return a dictionary of facet counts filtered to remove all facets that are completely constrained. This method is similar to the property
facet_counts
except facet values which are not relevant for further constraining are removed.
- search(batch_size=50, ignore_facet_check=False, **constraints)[source]¶
Perform the search with current constraints returning a set of results.
- Batch_size
The number of results to get per HTTP request.
- Ignore_facet_check
Do not make an extra HTTP request to populate
facet_counts
andhit_count
.- Parameters
constraints – Further constraints for this query. Equivalent to calling
self.constrain(**constraints).search()
- Returns
A ResultSet for this query
Module pyesgf.search.results
¶
Search results are retrieved through the ResultSet
class. This class
hides paging of large result sets behind a client-side cache. Subclasses of
Result
represent results of different SOLr record type.
- class pyesgf.search.results.AggregationResult(json, context)[source]¶
- A result object for ESGF aggregations. Properties from
BaseResult
are inherited.
- Property aggregation_id
The aggregation id
- A result object for ESGF aggregations. Properties from
- class pyesgf.search.results.BaseResult(json, context)[source]¶
Base class for results.
Subclasses represent different search types such as File and Dataset.
- Variables
json – The original json representation of the result.
context – The SearchContext which generated this result.
- Property urls
a dictionary of the form
{service: [(url, mime_type), ...], ...}
- Property opendap_url
The url of an OPeNDAP endpoint for this result if available
- Property las_url
The url of an LAS endpoint for this result if available
- Property download_url
The url for downloading the result by HTTP if available
- Property gridftp_url
The url for downloading the result by Globus if available
- Property globus_url
The url for downloading the result by Globus if available (including endpoint)
- Property index_node
The index node from where the metadata is stored. Calls to
*_context()
will optimise queries to only address this node.
- class pyesgf.search.results.DatasetResult(json, context)[source]¶
A result object for ESGF datasets.
- Property dataset_id
The solr dataset_id which is unique throughout the system.
- aggregation_context()[source]¶
Return a SearchContext for searching for aggregations within this dataset.
- property number_of_files¶
Returns file count as reported by the dataset record.
- class pyesgf.search.results.FileResult(json, context)[source]¶
- A result object for ESGF files. Properties from
BaseResult
are inherited.
- Property file_id
The identifier for the file
- Property checksum
The checksum of the file
- Property checksum_type
The algorithm used for generating the checksum
- Property filename
The filename
- Property size
The file size in bytes
- A result object for ESGF files. Properties from
- class pyesgf.search.results.ResultSet(context, batch_size=50, eager=True)[source]¶
- Variables
context – The search context object used to generate this resultset
- Property batch_size
The number of results that will be requested from esgf-search as one call. This must be set on creation and cannot change.
ESGF Security API¶
pyesgf
provides a simplified interface to obtaining ESGF credentials.
Module pyesgf.logon
¶
Manage the client’s interaction with ESGF’s security system. Using this module requires installing the MyProxyClient library.
To obtain ESGF credentials create a LogonManager
instance and supply
it with logon details:
>>> lm = LogonManager()
>>> lm.is_logged_on()
False
>>> lm.logon(username, password, myproxy_hostname, bootstrap=True)
>>> lm.is_logged_on()
True
Logon parameters that aren’t specified will be prompted for at the terminal
by default. The LogonManager
object also writes a .httprc
file
configuring OPeNDAP access through the NetCDF API.
The option bootstrap=True
is needed on the first run.
You can use your OpenID to logon instead. The logon details will be deduced from the OpenID where possible:
>>> lm.logoff()
>>> lm.is_logged_on()
False
>>> lm.logon_with_openid(openid, password, bootstrap=True)
>>> lm.is_logged_on()
True
- class pyesgf.logon.LogonManager(esgf_dir='/home/docs/.esg', dap_config='/home/docs/.dodsrc', verify=True)[source]¶
Manages ESGF crendentials and security configuration files.
Also integrates with NetCDF’s secure OPeNDAP configuration.
- logoff(clear_trustroots=False)[source]¶
Remove any obtained credentials from the ESGF environment.
- Parameters
clear_trustroots – If True also remove trustroots.
- logon(username=None, password=None, hostname=None, bootstrap=False, update_trustroots=True, interactive=True)[source]¶
Obtain ESGF credentials from the specified MyProxy service.
If
interactive == True
then any missing parameters ofpassword
,username
orhostname
will be prompted for at the terminal.- Parameters
interactive – Whether to ask for input at the terminal for any missing information. I.e. username, password or hostname.
bootstrap – Whether to bootstrap the trustroots for this MyProxy service.
update_trustroots – Whether to update the trustroots for this MyProxy service.
- logon_with_openid(openid, password=None, bootstrap=False, update_trustroots=True, interactive=True)[source]¶
Obtains ESGF credentials by detecting the MyProxy parameters from the users OpenID. Some ESGF compatible OpenIDs do not contain enough information to obtain credentials. In this case the user is prompted for missing information if
interactive == True
, otherwise an exception is raised.- Parameters
openid – OpenID to login with See
logon()
for parametersinteractive
,bootstrap
andupdate_trustroots
.