{ "cells": [ { "cell_type": "markdown", "id": "be2c57c1-798a-4eaf-b2b8-41c261b657d1", "metadata": {}, "source": [ "# How to read data from STAC\n", "\n", "This notebook shows how to read the data in from a STAC asset using [xarray](https://docs.xarray.dev/en/stable/) and a little hidden helper library called [xpystac](https://pypi.org/project/xpystac/).\n", "\n", "## tl;dr\n", "\n", "For any PySTAC asset that can be represented as an xarray dataset you can read the data using the following command:\n", "\n", "```python\n", "xr.open_dataset(asset)\n", "```\n", "\n", "If you want to load multiple assets from the same item and/or many items at once use:\n", "\n", "\n", "```python\n", "odc.stac.load([items])\n", "```\n", "\n", "## Dependencies\n", "\n", "There are lots of optional dependencies depending on where and how the data you are interested in are stored. Here are some of the libraries that you will probably need:\n", "\n", "- dask - to delay data loading until access\n", "- pystac - STAC object structures\n", "- xarray, rioxarray - data structures\n", "- xpystac, odc-stac - helper for loading pystac into xarray objects" ] }, { "cell_type": "markdown", "id": "ad3fb6dc-3529-47bd-a5b3-f5260f23db88", "metadata": {}, "source": [ "Despite all these install instructions, the import block is very straightforward" ] }, { "cell_type": "code", "execution_count": 1, "id": "2a8afebd-b397-4e7a-b448-0f59cc030e66", "metadata": { "execution": { "iopub.execute_input": "2025-08-21T19:55:58.808745Z", "iopub.status.busy": "2025-08-21T19:55:58.808345Z", "iopub.status.idle": "2025-08-21T19:56:01.983744Z", "shell.execute_reply": "2025-08-21T19:56:01.983228Z", "shell.execute_reply.started": "2025-08-21T19:55:58.808709Z" } }, "outputs": [], "source": [ "import odc.stac\n", "import planetary_computer\n", "import rioxarray\n", "import xarray as xr\n", "\n", "import pystac" ] }, { "cell_type": "markdown", "id": "6b24745c-b2d5-43d6-9c7e-66458b3a88e3", "metadata": {}, "source": [ "## Examples\n", "\n", "Here are a few examples of the different types of objects that you can open in xarray." ] }, { "cell_type": "markdown", "id": "30da7cfd-2861-4095-b15b-9952a7d824d9", "metadata": {}, "source": [ "### COGs\n", "\n", "Read all the data from the COGs referenced by the assets on an item." ] }, { "cell_type": "code", "execution_count": 2, "id": "c77432e6-8b0d-44d2-a947-ec74a529b8cb", "metadata": { "execution": { "iopub.execute_input": "2025-08-21T19:56:03.945808Z", "iopub.status.busy": "2025-08-21T19:56:03.945372Z", "iopub.status.idle": "2025-08-21T19:56:05.448439Z", "shell.execute_reply": "2025-08-21T19:56:05.448060Z", "shell.execute_reply.started": "2025-08-21T19:56:03.945773Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 2GB\n",
       "Dimensions:      (y: 7801, x: 7761, time: 1)\n",
       "Coordinates:\n",
       "  * y            (y) float64 62kB -3.713e+06 -3.713e+06 ... -3.947e+06\n",
       "  * x            (x) float64 62kB 3.774e+05 3.774e+05 ... 6.102e+05 6.102e+05\n",
       "    spatial_ref  int32 4B 32656\n",
       "  * time         (time) datetime64[ns] 8B 2023-04-08T23:37:51.630731\n",
       "Data variables: (12/19)\n",
       "    qa           (time, y, x) int16 121MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>\n",
       "    red          (time, y, x) uint16 121MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>\n",
       "    blue         (time, y, x) uint16 121MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>\n",
       "    drad         (time, y, x) int16 121MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>\n",
       "    emis         (time, y, x) int16 121MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>\n",
       "    emsd         (time, y, x) int16 121MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>\n",
       "    ...           ...\n",
       "    swir16       (time, y, x) uint16 121MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>\n",
       "    swir22       (time, y, x) uint16 121MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>\n",
       "    coastal      (time, y, x) uint16 121MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>\n",
       "    qa_pixel     (time, y, x) uint16 121MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>\n",
       "    qa_radsat    (time, y, x) uint16 121MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>\n",
       "    qa_aerosol   (time, y, x) uint8 61MB dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>
" ], "text/plain": [ " Size: 2GB\n", "Dimensions: (y: 7801, x: 7761, time: 1)\n", "Coordinates:\n", " * y (y) float64 62kB -3.713e+06 -3.713e+06 ... -3.947e+06\n", " * x (x) float64 62kB 3.774e+05 3.774e+05 ... 6.102e+05 6.102e+05\n", " spatial_ref int32 4B 32656\n", " * time (time) datetime64[ns] 8B 2023-04-08T23:37:51.630731\n", "Data variables: (12/19)\n", " qa (time, y, x) int16 121MB dask.array\n", " red (time, y, x) uint16 121MB dask.array\n", " blue (time, y, x) uint16 121MB dask.array\n", " drad (time, y, x) int16 121MB dask.array\n", " emis (time, y, x) int16 121MB dask.array\n", " emsd (time, y, x) int16 121MB dask.array\n", " ... ...\n", " swir16 (time, y, x) uint16 121MB dask.array\n", " swir22 (time, y, x) uint16 121MB dask.array\n", " coastal (time, y, x) uint16 121MB dask.array\n", " qa_pixel (time, y, x) uint16 121MB dask.array\n", " qa_radsat (time, y, x) uint16 121MB dask.array\n", " qa_aerosol (time, y, x) uint8 61MB dask.array" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "landsat_item = pystac.Item.from_file(\n", " \"https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-c2-l2/items/LC09_L2SP_088084_20230408_02_T2\",\n", ")\n", "\n", "ds = odc.stac.load(\n", " [landsat_item], chunks={\"x\": 1024, \"y\": 1024}, patch_url=planetary_computer.sign\n", ")\n", "ds" ] }, { "cell_type": "markdown", "id": "db92558d-4d8c-4afd-873e-76de202048b7", "metadata": {}, "source": [ "Let's prove that we really can access the data within the COGs." ] }, { "cell_type": "code", "execution_count": 3, "id": "add6347a-df37-495e-9794-7d86795d18f8", "metadata": { "execution": { "iopub.execute_input": "2025-08-21T19:56:05.449425Z", "iopub.status.busy": "2025-08-21T19:56:05.449167Z", "iopub.status.idle": "2025-08-21T19:56:10.768178Z", "shell.execute_reply": "2025-08-21T19:56:10.767351Z", "shell.execute_reply.started": "2025-08-21T19:56:05.449403Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 2.68 s, sys: 459 ms, total: 3.14 s\n", "Wall time: 5.31 s\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'blue' ()> Size: 8B\n",
       "array(13940.01841915)\n",
       "Coordinates:\n",
       "    spatial_ref  int32 4B 32656
" ], "text/plain": [ " Size: 8B\n", "array(13940.01841915)\n", "Coordinates:\n", " spatial_ref int32 4B 32656" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "ds.blue.mean().compute()" ] }, { "cell_type": "markdown", "id": "65ed7b6d-2e24-46cf-a61f-bc17df20588c", "metadata": {}, "source": [ "For more control over the individual COGs referenced by the assets, you can grab the href off the asset, sign it and then use it directly." ] }, { "cell_type": "code", "execution_count": 4, "id": "5873b326-820b-4eec-8ba3-c3207bf8f9ce", "metadata": { "execution": { "iopub.execute_input": "2025-08-21T19:56:10.769208Z", "iopub.status.busy": "2025-08-21T19:56:10.768974Z", "iopub.status.idle": "2025-08-21T19:56:11.425713Z", "shell.execute_reply": "2025-08-21T19:56:11.424874Z", "shell.execute_reply.started": "2025-08-21T19:56:10.769187Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray (band: 1, y: 122, x: 122)> Size: 30kB\n",
       "[14884 values with dtype=uint16]\n",
       "Coordinates:\n",
       "  * band         (band) int64 8B 1\n",
       "  * x            (x) float64 976B 3.783e+05 3.802e+05 ... 6.074e+05 6.093e+05\n",
       "  * y            (y) float64 976B -3.714e+06 -3.716e+06 ... -3.946e+06\n",
       "    spatial_ref  int64 8B 0\n",
       "Attributes:\n",
       "    AREA_OR_POINT:  Point\n",
       "    _FillValue:     0\n",
       "    scale_factor:   1.0\n",
       "    add_offset:     0.0
" ], "text/plain": [ " Size: 30kB\n", "[14884 values with dtype=uint16]\n", "Coordinates:\n", " * band (band) int64 8B 1\n", " * x (x) float64 976B 3.783e+05 3.802e+05 ... 6.074e+05 6.093e+05\n", " * y (y) float64 976B -3.714e+06 -3.716e+06 ... -3.946e+06\n", " spatial_ref int64 8B 0\n", "Attributes:\n", " AREA_OR_POINT: Point\n", " _FillValue: 0\n", " scale_factor: 1.0\n", " add_offset: 0.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "landsat_asset = landsat_item.assets[\"blue\"]\n", "landsat_asset_href = planetary_computer.sign(landsat_asset).href\n", "\n", "da = rioxarray.open_rasterio(landsat_asset_href, overview_level=5)\n", "da" ] }, { "cell_type": "code", "execution_count": 5, "id": "156a1c68-2d3f-48f1-8833-92d014b148c1", "metadata": { "execution": { "iopub.execute_input": "2025-08-21T19:56:11.426154Z", "iopub.status.busy": "2025-08-21T19:56:11.426030Z", "iopub.status.idle": "2025-08-21T19:56:11.622313Z", "shell.execute_reply": "2025-08-21T19:56:11.621036Z", "shell.execute_reply.started": "2025-08-21T19:56:11.426142Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 3.44 ms, sys: 0 ns, total: 3.44 ms\n", "Wall time: 186 ms\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray ()> Size: 8B\n",
       "array(14472.21996775)\n",
       "Coordinates:\n",
       "    spatial_ref  int64 8B 0
" ], "text/plain": [ " Size: 8B\n", "array(14472.21996775)\n", "Coordinates:\n", " spatial_ref int64 8B 0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "da.mean()" ] }, { "cell_type": "markdown", "id": "b2eceda5-7dc7-482c-a95a-7f24f7698933", "metadata": {}, "source": [ "### Zarr\n", "\n", "Read from an asset that references data stored in zarr" ] }, { "cell_type": "code", "execution_count": 6, "id": "5176680b-3d58-4c72-bf58-d80051f6de75", "metadata": { "execution": { "iopub.execute_input": "2025-08-21T19:56:11.624714Z", "iopub.status.busy": "2025-08-21T19:56:11.624335Z", "iopub.status.idle": "2025-08-21T19:56:22.617465Z", "shell.execute_reply": "2025-08-21T19:56:22.616757Z", "shell.execute_reply.started": "2025-08-21T19:56:11.624682Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 69GB\n",
       "Dimensions:                  (time: 14965, y: 584, x: 284, nv: 2)\n",
       "Coordinates:\n",
       "    lat                      (y, x) float32 663kB dask.array<chunksize=(584, 284), meta=np.ndarray>\n",
       "    lon                      (y, x) float32 663kB dask.array<chunksize=(584, 284), meta=np.ndarray>\n",
       "  * time                     (time) datetime64[ns] 120kB 1980-01-01T12:00:00 ...\n",
       "  * x                        (x) float32 1kB -5.802e+06 ... -5.519e+06\n",
       "  * y                        (y) float32 2kB -3.9e+04 -4e+04 ... -6.22e+05\n",
       "Dimensions without coordinates: nv\n",
       "Data variables:\n",
       "    dayl                     (time, y, x) float32 10GB dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    lambert_conformal_conic  int16 2B ...\n",
       "    prcp                     (time, y, x) float32 10GB dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    srad                     (time, y, x) float32 10GB dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    swe                      (time, y, x) float32 10GB dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    time_bnds                (time, nv) datetime64[ns] 239kB dask.array<chunksize=(365, 2), meta=np.ndarray>\n",
       "    tmax                     (time, y, x) float32 10GB dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    tmin                     (time, y, x) float32 10GB dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    vp                       (time, y, x) float32 10GB dask.array<chunksize=(365, 584, 284), meta=np.ndarray>\n",
       "    yearday                  (time) int16 30kB dask.array<chunksize=(365,), meta=np.ndarray>\n",
       "Attributes:\n",
       "    Conventions:       CF-1.6\n",
       "    Version_data:      Daymet Data Version 4.0\n",
       "    Version_software:  Daymet Software Version 4.0\n",
       "    citation:          Please see http://daymet.ornl.gov/ for current Daymet ...\n",
       "    references:        Please see http://daymet.ornl.gov/ for current informa...\n",
       "    source:            Daymet Software Version 4.0\n",
       "    start_year:        1980
" ], "text/plain": [ " Size: 69GB\n", "Dimensions: (time: 14965, y: 584, x: 284, nv: 2)\n", "Coordinates:\n", " lat (y, x) float32 663kB dask.array\n", " lon (y, x) float32 663kB dask.array\n", " * time (time) datetime64[ns] 120kB 1980-01-01T12:00:00 ...\n", " * x (x) float32 1kB -5.802e+06 ... -5.519e+06\n", " * y (y) float32 2kB -3.9e+04 -4e+04 ... -6.22e+05\n", "Dimensions without coordinates: nv\n", "Data variables:\n", " dayl (time, y, x) float32 10GB dask.array\n", " lambert_conformal_conic int16 2B ...\n", " prcp (time, y, x) float32 10GB dask.array\n", " srad (time, y, x) float32 10GB dask.array\n", " swe (time, y, x) float32 10GB dask.array\n", " time_bnds (time, nv) datetime64[ns] 239kB dask.array\n", " tmax (time, y, x) float32 10GB dask.array\n", " tmin (time, y, x) float32 10GB dask.array\n", " vp (time, y, x) float32 10GB dask.array\n", " yearday (time) int16 30kB dask.array\n", "Attributes:\n", " Conventions: CF-1.6\n", " Version_data: Daymet Data Version 4.0\n", " Version_software: Daymet Software Version 4.0\n", " citation: Please see http://daymet.ornl.gov/ for current Daymet ...\n", " references: Please see http://daymet.ornl.gov/ for current informa...\n", " source: Daymet Software Version 4.0\n", " start_year: 1980" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "daymet_collection = pystac.Collection.from_file(\n", " \"https://planetarycomputer.microsoft.com/api/stac/v1/collections/daymet-daily-hi\"\n", ")\n", "daymet_asset = daymet_collection.assets[\"zarr-abfs\"]\n", "daymet_asset_signed = planetary_computer.sign(daymet_asset)\n", "\n", "ds = xr.open_dataset(daymet_asset_signed, chunks={})\n", "ds" ] }, { "cell_type": "markdown", "id": "fd4e0c53-90b0-4276-9caf-9014aa0a31f9", "metadata": {}, "source": [ "### Reference file\n", "\n", "If the collection has a reference file we can use that.\n", "\n", "
\n", " \n", "This will not work for kerchunk>=0.2.8 and xpystac<=0.2.0. Track the xpystac ticket here: [xpystac #48](https://github.com/stac-utils/xpystac/issues/48)\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": 7, "id": "e69db448-c493-4972-ab1d-a8fb3f8ab71f", "metadata": { "execution": { "iopub.execute_input": "2025-08-21T19:56:22.618362Z", "iopub.status.busy": "2025-08-21T19:56:22.618022Z", "iopub.status.idle": "2025-08-21T19:56:33.446768Z", "shell.execute_reply": "2025-08-21T19:56:33.446285Z", "shell.execute_reply.started": "2025-08-21T19:56:22.618347Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 738GB\n",
       "Dimensions:  (time: 23741, lat: 600, lon: 1440)\n",
       "Coordinates:\n",
       "  * lat      (lat) float64 5kB -59.88 -59.62 -59.38 -59.12 ... 89.38 89.62 89.88\n",
       "  * lon      (lon) float64 12kB 0.125 0.375 0.625 0.875 ... 359.4 359.6 359.9\n",
       "  * time     (time) datetime64[us] 190kB 1950-01-01T12:00:00 ... 2014-12-31T1...\n",
       "Data variables:\n",
       "    hurs     (time, lat, lon) float32 82GB dask.array<chunksize=(1, 600, 1440), meta=np.ndarray>\n",
       "    huss     (time, lat, lon) float32 82GB dask.array<chunksize=(1, 600, 1440), meta=np.ndarray>\n",
       "    pr       (time, lat, lon) float32 82GB dask.array<chunksize=(1, 600, 1440), meta=np.ndarray>\n",
       "    rlds     (time, lat, lon) float32 82GB dask.array<chunksize=(1, 600, 1440), meta=np.ndarray>\n",
       "    rsds     (time, lat, lon) float32 82GB dask.array<chunksize=(1, 600, 1440), meta=np.ndarray>\n",
       "    sfcWind  (time, lat, lon) float32 82GB dask.array<chunksize=(1, 600, 1440), meta=np.ndarray>\n",
       "    tas      (time, lat, lon) float32 82GB dask.array<chunksize=(1, 600, 1440), meta=np.ndarray>\n",
       "    tasmax   (time, lat, lon) float32 82GB dask.array<chunksize=(1, 600, 1440), meta=np.ndarray>\n",
       "    tasmin   (time, lat, lon) float32 82GB dask.array<chunksize=(1, 600, 1440), meta=np.ndarray>\n",
       "Attributes: (12/22)\n",
       "    Conventions:           CF-1.7\n",
       "    activity:              NEX-GDDP-CMIP6\n",
       "    cmip6_institution_id:  CSIRO-ARCCSS\n",
       "    cmip6_license:         CC-BY-SA 4.0\n",
       "    cmip6_source_id:       ACCESS-CM2\n",
       "    contact:               Dr. Rama Nemani: rama.nemani@nasa.gov, Dr. Bridget...\n",
       "    ...                    ...\n",
       "    scenario:              historical\n",
       "    source:                BCSD\n",
       "    title:                 ACCESS-CM2, r1i1p1f1, historical, global downscale...\n",
       "    tracking_id:           16d27564-470f-41ea-8077-f4cc3efa5bfe\n",
       "    variant_label:         r1i1p1f1\n",
       "    version:               1.0
" ], "text/plain": [ " Size: 738GB\n", "Dimensions: (time: 23741, lat: 600, lon: 1440)\n", "Coordinates:\n", " * lat (lat) float64 5kB -59.88 -59.62 -59.38 -59.12 ... 89.38 89.62 89.88\n", " * lon (lon) float64 12kB 0.125 0.375 0.625 0.875 ... 359.4 359.6 359.9\n", " * time (time) datetime64[us] 190kB 1950-01-01T12:00:00 ... 2014-12-31T1...\n", "Data variables:\n", " hurs (time, lat, lon) float32 82GB dask.array\n", " huss (time, lat, lon) float32 82GB dask.array\n", " pr (time, lat, lon) float32 82GB dask.array\n", " rlds (time, lat, lon) float32 82GB dask.array\n", " rsds (time, lat, lon) float32 82GB dask.array\n", " sfcWind (time, lat, lon) float32 82GB dask.array\n", " tas (time, lat, lon) float32 82GB dask.array\n", " tasmax (time, lat, lon) float32 82GB dask.array\n", " tasmin (time, lat, lon) float32 82GB dask.array\n", "Attributes: (12/22)\n", " Conventions: CF-1.7\n", " activity: NEX-GDDP-CMIP6\n", " cmip6_institution_id: CSIRO-ARCCSS\n", " cmip6_license: CC-BY-SA 4.0\n", " cmip6_source_id: ACCESS-CM2\n", " contact: Dr. Rama Nemani: rama.nemani@nasa.gov, Dr. Bridget...\n", " ... ...\n", " scenario: historical\n", " source: BCSD\n", " title: ACCESS-CM2, r1i1p1f1, historical, global downscale...\n", " tracking_id: 16d27564-470f-41ea-8077-f4cc3efa5bfe\n", " variant_label: r1i1p1f1\n", " version: 1.0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cmip6_collection = pystac.Collection.from_file(\n", " \"https://planetarycomputer.microsoft.com/api/stac/v1/collections/nasa-nex-gddp-cmip6\"\n", ")\n", "cmip6_asset = cmip6_collection.assets[\"ACCESS-CM2.historical\"]\n", "cmip6_asset_signed = planetary_computer.sign(cmip6_asset)\n", "\n", "ds = xr.open_dataset(cmip6_asset_signed, chunks={}, patch_url=planetary_computer.sign)\n", "ds" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }