How to read data from STAC#
This notebook shows how to read the data in from a STAC asset using xarray and a little hidden helper library called xpystac.
tl;dr#
For any PySTAC object that can be represented as an ndimensional dataset you can read the data using the following command:
xr.open_dataset(object)
Dependencies#
There are lots of optional dependencies depending on where and how the data you are interested in are stored. Here are some of the libraries that you will probably need:
dask - to delay data loading until access
fsspec - to access data from remote storage
pystac - STAC object structures
xarray, rioxarray - data structures
xpystac, stackstac - helper for loading pystac into xarray objects
[1]:
!pip install adlfs dask 'fsspec[http]' planetary_computer --quiet
!pip install stackstac xarray xpystac zarr --quiet
Despite all these install instructions, the import block is very straightforward
[2]:
import xarray as xr
import pystac
Examples#
Here are a few examples of the different types of objects that you can open in xarray.
COGs#
Read all the data from the COGs referenced by the assets on an item.
[3]:
landsat_item = pystac.Item.from_file(
"https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-c2-l2/items/LC09_L2SP_088084_20230408_02_T2"
)
xr.open_dataset(landsat_item)
[3]:
<xarray.Dataset> Dimensions: (time: 1, y: 7802, x: 7762, band: 19) Coordinates: (12/32) * time (time) datetime64[ns] 2023-04-08T23:37:51.63... id (time) <U31 ... * x (x) float64 3.774e+05 3.774e+05 ... 6.102e+05 * y (y) float64 -3.713e+06 ... -3.947e+06 proj:shape object ... sci:doi <U16 ... ... ... raster:bands (band) object ... classification:bitfields (band) object ... common_name (band) object ... center_wavelength (band) object ... full_width_half_max (band) object ... epsg int64 ... Dimensions without coordinates: band Data variables: (12/19) qa (time, y, x) float64 ... red (time, y, x) float64 ... blue (time, y, x) float64 ... drad (time, y, x) float64 ... emis (time, y, x) float64 ... emsd (time, y, x) float64 ... ... ... swir16 (time, y, x) float64 ... swir22 (time, y, x) float64 ... coastal (time, y, x) float64 ... qa_pixel (time, y, x) float64 ... qa_radsat (time, y, x) float64 ... qa_aerosol (time, y, x) float64 ... Attributes: spec: RasterSpec(epsg=32656, bounds=(377370.0, -3947130.0, 610230.... crs: epsg:32656 transform: | 30.00, 0.00, 377370.00|\n| 0.00,-30.00,-3713070.00|\n| 0.0... resolution: 30.0
Zarr#
Read from an asset that references data stored in zarr
[4]:
daymet_collection = pystac.Collection.from_file(
"https://planetarycomputer.microsoft.com/api/stac/v1/collections/daymet-daily-hi"
)
daymet_asset = daymet_collection.assets["zarr-abfs"]
xr.open_dataset(daymet_asset)
[4]:
<xarray.Dataset> Dimensions: (time: 14965, y: 584, x: 284, nv: 2) Coordinates: lat (y, x) float32 ... lon (y, x) float32 ... * time (time) datetime64[ns] 1980-01-01T12:00:00 ... 20... * x (x) float32 -5.802e+06 -5.801e+06 ... -5.519e+06 * y (y) float32 -3.9e+04 -4e+04 ... -6.21e+05 -6.22e+05 Dimensions without coordinates: nv Data variables: dayl (time, y, x) float32 ... lambert_conformal_conic int16 ... prcp (time, y, x) float32 ... srad (time, y, x) float32 ... swe (time, y, x) float32 ... time_bnds (time, nv) datetime64[ns] ... tmax (time, y, x) float32 ... tmin (time, y, x) float32 ... vp (time, y, x) float32 ... yearday (time) int16 ... Attributes: Conventions: CF-1.6 Version_data: Daymet Data Version 4.0 Version_software: Daymet Software Version 4.0 citation: Please see http://daymet.ornl.gov/ for current Daymet ... references: Please see http://daymet.ornl.gov/ for current informa... source: Daymet Software Version 4.0 start_year: 1980
Reference file#
If the collection has a reference file we can use that
[5]:
cmip6_collection = pystac.Collection.from_file(
"https://planetarycomputer.microsoft.com/api/stac/v1/collections/nasa-nex-gddp-cmip6"
)
cmip6_asset = cmip6_collection.assets["ACCESS-CM2.historical"]
xr.open_dataset(cmip6_asset)
[5]:
<xarray.Dataset> Dimensions: (time: 23741, lat: 600, lon: 1440) Coordinates: * lat (lat) float64 -59.88 -59.62 -59.38 -59.12 ... 89.38 89.62 89.88 * lon (lon) float64 0.125 0.375 0.625 0.875 ... 359.1 359.4 359.6 359.9 * time (time) datetime64[us] 1950-01-01T12:00:00 ... 2014-12-31T12:00:00 Data variables: hurs (time, lat, lon) float32 ... huss (time, lat, lon) float32 ... pr (time, lat, lon) float32 ... rlds (time, lat, lon) float32 ... rsds (time, lat, lon) float32 ... sfcWind (time, lat, lon) float32 ... tas (time, lat, lon) float32 ... tasmax (time, lat, lon) float32 ... tasmin (time, lat, lon) float32 ... Attributes: (12/22) Conventions: CF-1.7 activity: NEX-GDDP-CMIP6 cmip6_institution_id: CSIRO-ARCCSS cmip6_license: CC-BY-SA 4.0 cmip6_source_id: ACCESS-CM2 contact: Dr. Rama Nemani: rama.nemani@nasa.gov, Dr. Bridget... ... ... scenario: historical source: BCSD title: ACCESS-CM2, r1i1p1f1, historical, global downscale... tracking_id: 16d27564-470f-41ea-8077-f4cc3efa5bfe variant_label: r1i1p1f1 version: 1.0