Examples of pyesgf.search usage

Prelude:

[ ]:
from pyesgf.search import SearchConnection
conn = SearchConnection('https://esgf.ceda.ac.uk/esg-search',
                        distrib=True)

Warning: don’t use default search with facets=*.

This behavior is kept for backward-compatibility, but ESGF indexes might not successfully perform a distributed search when this option is used, so some results may be missing. For full results, it is recommended to pass a list of facets of interest when instantiating a context object. For example,

ctx = conn.new_context(facets='project,experiment_id')

Only the facets that you specify will be present in the facets_counts dictionary.

This warning is displayed when a distributed search is performed while using the facets=* default, a maximum of once per context object. To suppress this warning, set the environment variable ESGF_PYCLIENT_NO_FACETS_STAR_WARNING to any value or explicitly use conn.new_context(facets='*')

[ ]:
facets='project,experiment_family'

Find how many datasets containing humidity in a given experiment family:

[ ]:
ctx = conn.new_context(project='CMIP5', query='humidity', facets=facets)
ctx.hit_count
[ ]:
ctx.facet_counts['experiment_family']

Search using a partial ESGF dataset ID (and get first download URL):

[ ]:
conn = SearchConnection('https://esgf.ceda.ac.uk/esg-search', distrib=False)
ctx = conn.new_context(facets=facets)
dataset_id_pattern = "cmip5.output1.MOHC.HadGEM2-CC.historical.mon.atmos.Amon.*"
results = ctx.search(query="id:%s" % dataset_id_pattern)
len(results)
[ ]:
files = results[0].file_context().search()
len(files)
[ ]:
download_url = files[0].download_url
print(download_url)

Find the OpenDAP URL for an aggregated dataset:

[ ]:
conn = SearchConnection('http://esgf-data.dkrz.de/esg-search', distrib=False)
ctx = conn.new_context(project='CMIP5', model='MPI-ESM-LR', experiment='decadal2000', time_frequency='day')
print('Hits: {}, Realms: {}, Ensembles: {}'.format(
    ctx.hit_count,
    ctx.facet_counts['realm'],
    ctx.facet_counts['ensemble']))
[ ]:
ctx = ctx.constrain(realm='atmos', ensemble='r1i1p1')
ctx.hit_count
[ ]:
result = ctx.search()[0]
agg_ctx = result.aggregation_context()
agg = agg_ctx.search()[0]
print(agg.opendap_url)

Find download URLs for all files in a dataset:

[ ]:
conn = SearchConnection('http://esgf-data.dkrz.de/esg-search', distrib=False)
ctx = conn.new_context(project='obs4MIPs')
ctx.hit_count

[ ]:
ds = ctx.search()[0]
files = ds.file_context().search()
len(files)
[ ]:
for f in files:
    print(f.download_url)

Define a search for datasets that includes a temporal range:

[ ]:
conn = SearchConnection('https://esgf.ceda.ac.uk/esg-search', distrib=False)
ctx = conn.new_context(
    project="CMIP5", model="HadGEM2-ES",
    time_frequency="mon", realm="atmos", ensemble="r1i1p1", latest=True,
    from_timestamp="2100-12-30T23:23:59Z", to_timestamp="2200-01-01T00:00:00Z")
ctx.hit_count

Or do the same thing by searching without temporal constraints and then applying the constraint:

[ ]:
ctx = conn.new_context(
    project="CMIP5", model="HadGEM2-ES",
    time_frequency="mon", realm="atmos", ensemble="r1i1p1", latest=True)
ctx.hit_count
[ ]:
ctx = ctx.constrain(from_timestamp = "2100-12-30T23:23:59Z", to_timestamp = "2200-01-01T00:00:00Z")
ctx.hit_count