How to read data from STAC#

This notebook shows how to read the data in from a STAC asset using xarray and a little hidden helper library called xpystac.

tl;dr#

For any PySTAC object that can be represented as an ndimensional dataset you can read the data using the following command:

xr.open_dataset(object)

Dependencies#

There are lots of optional dependencies depending on where and how the data you are interested in are stored. Here are some of the libraries that you will probably need:

  • dask - to delay data loading until access

  • fsspec - to access data from remote storage

  • pystac - STAC object structures

  • xarray, rioxarray - data structures

  • xpystac, stackstac - helper for loading pystac into xarray objects

[1]:
!pip install adlfs dask 'fsspec[http]' planetary_computer --quiet
!pip install stackstac xarray xpystac zarr --quiet

Despite all these install instructions, the import block is very straightforward

[2]:
import xarray as xr

import pystac

Examples#

Here are a few examples of the different types of objects that you can open in xarray.

COGs#

Read all the data from the COGs referenced by the assets on an item.

[3]:
landsat_item = pystac.Item.from_file(
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-c2-l2/items/LC09_L2SP_088084_20230408_02_T2"
)
xr.open_dataset(landsat_item)
[3]:
<xarray.Dataset>
Dimensions:                      (time: 1, y: 7802, x: 7762, band: 19)
Coordinates: (12/32)
  * time                         (time) datetime64[ns] 2023-04-08T23:37:51.63...
    id                           (time) <U31 ...
  * x                            (x) float64 3.774e+05 3.774e+05 ... 6.102e+05
  * y                            (y) float64 -3.713e+06 ... -3.947e+06
    proj:shape                   object ...
    sci:doi                      <U16 ...
    ...                           ...
    raster:bands                 (band) object ...
    classification:bitfields     (band) object ...
    common_name                  (band) object ...
    center_wavelength            (band) object ...
    full_width_half_max          (band) object ...
    epsg                         int64 ...
Dimensions without coordinates: band
Data variables: (12/19)
    qa                           (time, y, x) float64 ...
    red                          (time, y, x) float64 ...
    blue                         (time, y, x) float64 ...
    drad                         (time, y, x) float64 ...
    emis                         (time, y, x) float64 ...
    emsd                         (time, y, x) float64 ...
    ...                           ...
    swir16                       (time, y, x) float64 ...
    swir22                       (time, y, x) float64 ...
    coastal                      (time, y, x) float64 ...
    qa_pixel                     (time, y, x) float64 ...
    qa_radsat                    (time, y, x) float64 ...
    qa_aerosol                   (time, y, x) float64 ...
Attributes:
    spec:        RasterSpec(epsg=32656, bounds=(377370.0, -3947130.0, 610230....
    crs:         epsg:32656
    transform:   | 30.00, 0.00, 377370.00|\n| 0.00,-30.00,-3713070.00|\n| 0.0...
    resolution:  30.0

Zarr#

Read from an asset that references data stored in zarr

[4]:
daymet_collection = pystac.Collection.from_file(
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections/daymet-daily-hi"
)
daymet_asset = daymet_collection.assets["zarr-abfs"]

xr.open_dataset(daymet_asset)
[4]:
<xarray.Dataset>
Dimensions:                  (time: 14965, y: 584, x: 284, nv: 2)
Coordinates:
    lat                      (y, x) float32 ...
    lon                      (y, x) float32 ...
  * time                     (time) datetime64[ns] 1980-01-01T12:00:00 ... 20...
  * x                        (x) float32 -5.802e+06 -5.801e+06 ... -5.519e+06
  * y                        (y) float32 -3.9e+04 -4e+04 ... -6.21e+05 -6.22e+05
Dimensions without coordinates: nv
Data variables:
    dayl                     (time, y, x) float32 ...
    lambert_conformal_conic  int16 ...
    prcp                     (time, y, x) float32 ...
    srad                     (time, y, x) float32 ...
    swe                      (time, y, x) float32 ...
    time_bnds                (time, nv) datetime64[ns] ...
    tmax                     (time, y, x) float32 ...
    tmin                     (time, y, x) float32 ...
    vp                       (time, y, x) float32 ...
    yearday                  (time) int16 ...
Attributes:
    Conventions:       CF-1.6
    Version_data:      Daymet Data Version 4.0
    Version_software:  Daymet Software Version 4.0
    citation:          Please see http://daymet.ornl.gov/ for current Daymet ...
    references:        Please see http://daymet.ornl.gov/ for current informa...
    source:            Daymet Software Version 4.0
    start_year:        1980

Reference file#

If the collection has a reference file we can use that

[5]:
cmip6_collection = pystac.Collection.from_file(
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections/nasa-nex-gddp-cmip6"
)
cmip6_asset = cmip6_collection.assets["ACCESS-CM2.historical"]

xr.open_dataset(cmip6_asset)
[5]:
<xarray.Dataset>
Dimensions:  (time: 23741, lat: 600, lon: 1440)
Coordinates:
  * lat      (lat) float64 -59.88 -59.62 -59.38 -59.12 ... 89.38 89.62 89.88
  * lon      (lon) float64 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9
  * time     (time) datetime64[us] 1950-01-01T12:00:00 ... 2014-12-31T12:00:00
Data variables:
    hurs     (time, lat, lon) float32 ...
    huss     (time, lat, lon) float32 ...
    pr       (time, lat, lon) float32 ...
    rlds     (time, lat, lon) float32 ...
    rsds     (time, lat, lon) float32 ...
    sfcWind  (time, lat, lon) float32 ...
    tas      (time, lat, lon) float32 ...
    tasmax   (time, lat, lon) float32 ...
    tasmin   (time, lat, lon) float32 ...
Attributes: (12/22)
    Conventions:           CF-1.7
    activity:              NEX-GDDP-CMIP6
    cmip6_institution_id:  CSIRO-ARCCSS
    cmip6_license:         CC-BY-SA 4.0
    cmip6_source_id:       ACCESS-CM2
    contact:               Dr. Rama Nemani: rama.nemani@nasa.gov, Dr. Bridget...
    ...                    ...
    scenario:              historical
    source:                BCSD
    title:                 ACCESS-CM2, r1i1p1f1, historical, global downscale...
    tracking_id:           16d27564-470f-41ea-8077-f4cc3efa5bfe
    variant_label:         r1i1p1f1
    version:               1.0