{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quickstart\n", "\n", "This notebook is a quick introduction to using PySTAC for reading an existing STAC catalog. For more in-depth examples check out the other tutorials.\n", "\n", "## Dependencies\n", "\n", "- PySTAC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading a Catalog\n", "\n", "[A STAC Catalog](https://github.com/radiantearth/stac-spec/tree/master/catalog-spec) is used to group other STAC objects like Items, Collections, or even other Catalogs.\n", "\n", "We will be using a small example catalog adapted from the [example Landsat Collection](https://github.com/geotrellis/geotrellis-server/tree/977bad7a64c409341479c281c8c72222008861fd/stac-example/catalog/landsat-stac-collection) in the [GeoTrellis](https://geotrellis.io) repository. All STAC Items and Collections can be found in the [docs/example-catalog](https://github.com/stac-utils/pystac/tree/main/docs/example-catalog) directory of this repo; all Assets are hosted in the Landsat S3 bucket.\n", "\n", "First, we import the PySTAC classes we will be working with." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import shutil\n", "import tempfile\n", "from pathlib import Path\n", "\n", "from pystac import Catalog, get_stac_version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we read the example catalog and print some basic metadata." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ID: landsat-stac-collection-catalog\n", "Title: STAC for Landsat data\n", "Description: STAC for Landsat data\n" ] } ], "source": [ "root_catalog = Catalog.from_file(\"./example-catalog/catalog.json\")\n", "print(f\"ID: {root_catalog.id}\")\n", "print(f\"Title: {root_catalog.title or 'N/A'}\")\n", "print(f\"Description: {root_catalog.description or 'N/A'}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note that we do not print the \"stac_version\" here. PySTAC automatically updates any Catalogs to the most recent supported STAC version and will automatically write this to the JSON object during serialization.*\n", "\n", "Let's confirm the latest STAC Spec version supported by PySTAC." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.0.0\n" ] } ], "source": [ "print(get_stac_version())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crawling Child Catalogs/Collections\n", "\n", "[STAC Collections](https://github.com/radiantearth/stac-spec/tree/master/collection-spec) are used to group related Items and provide aggregate or summary metadata for those Items.\n", "\n", "STAC Catalogs may have many nested layers of Catalogs or Collections within the top-level collection. Our example catalog has one Collection within the main Catalog at [landsat-8-l1/collection.json](./example-catalog/landsat-8-l1/collection.json). We can list the Collections in a given Catalog using the [Catalog.get_collections](https://pystac.readthedocs.io/en/latest/api.html#pystac.Catalog.get_collections) method. This method returns an iterable of PySTAC [Collection](https://pystac.readthedocs.io/en/latest/api.html#collection) instances, which we will turn into a `list`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of collections: 1\n", "Collections IDs:\n", "- landsat-8-l1\n" ] } ], "source": [ "collections = list(root_catalog.get_collections())\n", "\n", "print(f\"Number of collections: {len(collections)}\")\n", "print(\"Collections IDs:\")\n", "for collection in collections:\n", " print(f\"- {collection.id}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's grab that Collection as a PySTAC [Collection](https://pystac.readthedocs.io/en/latest/api.html#collection) instance using the [Catalog.get_child method](https://pystac.readthedocs.io/en/latest/api.html#pystac.Catalog.get_child) so we can look at it in more detail. This method gets a child Catalog or Collection by ID, so we'll use the Collection ID that we printed above. Since this method returns `None` if no child exists with the given ID, we'll check to make sure we actually got the `Collection`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "collection = root_catalog.get_child(\"landsat-8-l1\")\n", "assert collection is not None" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Crawling Items\n", "\n", "[STAC Items](https://github.com/radiantearth/stac-spec/tree/master/item-spec) are the fundamental building blocks of a STAC Catalog. Each Item represents a single spatiotemporal resource (e.g. a satellite scene).\n", "\n", "Both Catalogs and Collections may have Items associated with them. Let's crawl our catalog, starting at the root, to see what Items we have. The [Catalog.get_items method](https://pystac.readthedocs.io/en/latest/api.html#pystac.Catalog.get_items) provides a convenient way of recursively listing all Items associated with a Catalog and all of its sub-Catalogs by including the `recursive=True` option." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of items: 4\n", "- LC80140332018166LGN00\n", "- LC80150322018141LGN00\n", "- LC80150332018189LGN00\n", "- LC80300332018166LGN00\n" ] } ], "source": [ "items = list(root_catalog.get_items(recursive=True))\n", "\n", "print(f\"Number of items: {len(items)}\")\n", "for item in items:\n", " print(f\"- {item.id}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These IDs are not very descriptive; in the next section, we will take a look at how we can access the rich metadata associated with each Item." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Item Metadata\n", "\n", "Items can have *a lot* of metadata. This can be a bit overwhelming at first, but break the metadata fields down into a few categories:\n", "\n", "- Core Item Metadata\n", "- Common Metadata\n", "- STAC Extensions\n", "\n", "We will walk through each of these metadata categories in the following sections. \n", "\n", "First, let's grab one of the Items using the [Catalog.get_items method](https://pystac.readthedocs.io/en/latest/api.html#pystac.Catalog.get_items). We will use `recursive=True` to recursively crawl all child Catalogs and/or Collections to find the Item." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "item = next(root_catalog.get_items(\"LC80140332018166LGN00\", recursive=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Core Item Metadata\n", "\n", "The core Item metadata fields include spatiotemporal information and the ID of the collection to which the Item belongs. These fields are all at the top level of the Item JSON and we can access them through attributes on the [PySTAC Item](https://pystac.readthedocs.io/en/latest/api.html#item) instance." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'type': 'Polygon',\n", " 'coordinates': [[[-76.12180471942207, 39.95810181489563],\n", " [-73.94910518227414, 39.55117185146004],\n", " [-74.49564725552679, 37.826064511480496],\n", " [-76.66550404911956, 38.240699151776084],\n", " [-76.12180471942207, 39.95810181489563]]]}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.geometry" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[-76.66703, 37.82561, -73.94861, 39.95958]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.bbox" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2018, 6, 15, 15, 39, 9, tzinfo=tzutc())" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.datetime" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'landsat-8-l1'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.collection_id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want the actual `Collection` instance instead of just the ID, we can use the [Item.get_collection](https://pystac.readthedocs.io/en/latest/api.html#pystac.Item.get_collection) method." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.get_collection()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Common Metadata\n", "\n", "Certain fields that are commonly used in Items, but may also be found in other objects (e.g. Assets) are defined in the [Common Metadata](https://github.com/radiantearth/stac-spec/blob/master/item-spec/common-metadata.md) section of the spec. These include licensing and instrument information, descriptions of datetime ranges, and some other common fields. These properties can be found as attributes of the `Item.common_metadata` property, which is an instance of the [CommonMetadata class](https://pystac.readthedocs.io/en/latest/api.html#pystac.CommonMetadata)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['OLI_TIRS']" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.common_metadata.instruments" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'landsat-8'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.common_metadata.platform" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "30" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.common_metadata.gsd" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### STAC Extensions\n", "\n", "[STAC Extensions](https://stac-extensions.github.io/) are a mechanism for providing additional metadata not covered by the core STAC Spec. We can see which STAC Extensions are implemented by this particular Item by examining the list of extension URIs in the `stac_extensions` field." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['https://stac-extensions.github.io/eo/v1.1.0/schema.json',\n", " 'https://stac-extensions.github.io/view/v1.0.0/schema.json',\n", " 'https://stac-extensions.github.io/projection/v1.1.0/schema.json']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.stac_extensions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This Item implements the [Electro-Optical](https://github.com/stac-extensions/eo), [View Geometry](https://github.com/stac-extensions/view), and [Projection](https://github.com/stac-extensions/projection) Extensions. \n", "\n", "We can also check if a specific extension is implemented using [ext.has](https://pystac.readthedocs.io/en/latest/api.html#pystac.item.ext.has) with the name of that extension." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.ext.has(\"eo\")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.ext.has(\"raster\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can access fields associated with the extension as attributes on the extension instance. For instance, the [\"eo:cloud_cover\" field](https://github.com/stac-extensions/eo#item-properties-or-asset-fields) defined in the Electro-Optical Extension can be accessed using the [item.ext.eo.cloud_cover](https://pystac.readthedocs.io/en/latest/api.html#pystac.extensions.eo.EOExtension.cloud_cover) attribute." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "22" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.ext.eo.cloud_cover" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also access the cloud cover field directly in the Item properties." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "22" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.properties[\"eo:cloud_cover\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can access the Item's assets through the `assets` attribute, which is a dictionary:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "index: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/index.html (text/html)\n", "thumbnail: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_thumb_large.jpg (image/jpeg)\n", "B1: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B1.TIF (image/tiff)\n", "B2: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B2.TIF (image/tiff)\n", "B3: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B3.TIF (image/tiff)\n", "B4: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B4.TIF (image/tiff)\n", "B5: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B5.TIF (image/tiff)\n", "B6: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B6.TIF (image/tiff)\n", "B7: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B7.TIF (image/tiff)\n", "B8: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B8.TIF (image/tiff)\n", "B9: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B9.TIF (image/tiff)\n", "B10: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B10.TIF (image/tiff)\n", "B11: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B11.TIF (image/tiff)\n", "ANG: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_ANG.txt (text/plain)\n", "MTL: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_MTL.txt (text/plain)\n", "BQA: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_BQA.TIF (image/tiff)\n" ] } ], "source": [ "for asset_key in item.assets:\n", " asset = item.assets[asset_key]\n", " print(\"{}: {} ({})\".format(asset_key, asset.href, asset.media_type))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the `to_dict()` method to convert an Asset, or any PySTAC object, into a dictionary:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'href': 'https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B3.TIF',\n", " 'type': 'image/tiff',\n", " 'title': 'Band 3 (green)',\n", " 'eo:bands': [{'name': 'B3',\n", " 'full_width_half_max': 0.06,\n", " 'center_wavelength': 0.56,\n", " 'common_name': 'green'}],\n", " 'roles': []}" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "asset = item.assets[\"B3\"]\n", "asset.to_dict()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we use the eo extension to get the band information for the asset:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bands = asset.ext.eo.bands\n", "bands" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'name': 'B3',\n", " 'full_width_half_max': 0.06,\n", " 'center_wavelength': 0.56,\n", " 'common_name': 'green'}" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bands[0].to_dict()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Writing STAC Objects\n", "\n", "We can also use PySTAC to create and/or update STAC objects and write them to disk. This Quickstart Tutorial will introduce you to some very basic concepts in writing STAC objects; for a more thorough tutorial, please see the [\"How to create STAC Catalogs\"](./tutorials/how-to-create-stac-catalogs.ipynb) tutorial.\n", "\n", "Suppose there was a mistake in the cloud cover value that we looked at earlier and that we would like to add a value for the `instrument` field, which is currently null. We can update these values using the same attributes and properties as before, then save the entire catalog to our local drive." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "new_catalog = root_catalog.clone()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "item_to_update = next(root_catalog.get_items(\"LC80140332018166LGN00\", recursive=True))\n", "\n", "# Update the cloud cover\n", "item_to_update.ext.eo.cloud_cover = 30\n", "\n", "# Add the instrument field\n", "item_to_update.common_metadata.instruments = [\"LANDSAT\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can examine the Item properties directly to verify that the changes have taken effect." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "New Cloud Cover: 30\n", "New Instruments: ['LANDSAT']\n" ] } ], "source": [ "print(f\"New Cloud Cover: {item_to_update.properties['eo:cloud_cover']}\")\n", "print(f\"New Instruments: {item_to_update.properties['instruments']}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will write this updated catalog to a temporary directory in our local drive using the [Catalog.normalize_and_save](https://pystac.readthedocs.io/en/latest/api.html#pystac.Catalog.normalize_and_save) method." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# Create a temporary directory\n", "tmp_dir = tempfile.mkdtemp()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Catalog saved to: /tmp/tmp9bmp70k9/catalog.json\n" ] } ], "source": [ "# Save the catalog and normalize all paths\n", "new_catalog.normalize_and_save(tmp_dir)\n", "print(f\"Catalog saved to: {new_catalog.get_self_href()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can open up Item that we just updated to verify that the new values were written to disk." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "item_path = Path(tmp_dir) / \"landsat-8-l1\" / \"LC80140332018166LGN00\" / \"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we clean up the temporary directory." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "shutil.rmtree(tmp_dir, ignore_errors=True)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" }, "vscode": { "interpreter": { "hash": "28618a729221ed2dc6301bcedf20e90b9d193b9b884dd15c675da71a09b73fa8" } } }, "nbformat": 4, "nbformat_minor": 4 }