Quickstart#

This notebook is a quick introduction to using PySTAC for reading an existing STAC catalog. For more in-depth examples check out the other tutorials.

Dependencies#

  • PySTAC

Reading a Catalog#

A STAC Catalog is used to group other STAC objects like Items, Collections, or even other Catalogs.

We will be using a small example catalog adapted from the example Landsat Collection in the GeoTrellis repository. All STAC Items and Collections can be found in the docs/example-catalog directory of this repo; all Assets are hosted in the Landsat S3 bucket.

First, we import the PySTAC classes we will be working with.

[1]:
import shutil
import tempfile
from pathlib import Path

from pystac import Catalog, get_stac_version

Next, we read the example catalog and print some basic metadata.

[2]:
root_catalog = Catalog.from_file("./example-catalog/catalog.json")
print(f"ID: {root_catalog.id}")
print(f"Title: {root_catalog.title or 'N/A'}")
print(f"Description: {root_catalog.description or 'N/A'}")
ID: landsat-stac-collection-catalog
Title: STAC for Landsat data
Description: STAC for Landsat data

Note that we do not print the “stac_version” here. PySTAC automatically updates any Catalogs to the most recent supported STAC version and will automatically write this to the JSON object during serialization.

Let’s confirm the latest STAC Spec version supported by PySTAC.

[3]:
print(get_stac_version())
1.0.0

Crawling Child Catalogs/Collections#

STAC Collections are used to group related Items and provide aggregate or summary metadata for those Items.

STAC Catalogs may have many nested layers of Catalogs or Collections within the top-level collection. Our example catalog has one Collection within the main Catalog at landsat-8-l1/collection.json. We can list the Collections in a given Catalog using the Catalog.get_collections method. This method returns an iterable of PySTAC Collection instances, which we will turn into a list.

[4]:
collections = list(root_catalog.get_collections())

print(f"Number of collections: {len(collections)}")
print("Collections IDs:")
for collection in collections:
    print(f"- {collection.id}")
Number of collections: 1
Collections IDs:
- landsat-8-l1

Let’s grab that Collection as a PySTAC Collection instance using the Catalog.get_child method so we can look at it in more detail. This method gets a child Catalog or Collection by ID, so we’ll use the Collection ID that we printed above. Since this method returns None if no child exists with the given ID, we’ll check to make sure we actually got the Collection.

[5]:
collection = root_catalog.get_child("landsat-8-l1")
assert collection is not None

Crawling Items#

STAC Items are the fundamental building blocks of a STAC Catalog. Each Item represents a single spatiotemporal resource (e.g. a satellite scene).

Both Catalogs and Collections may have Items associated with them. Let’s crawl our catalog, starting at the root, to see what Items we have. The Catalog.get_items method provides a convenient way of recursively listing all Items associated with a Catalog and all of its sub-Catalogs by including the recursive=True option.

[6]:
items = list(root_catalog.get_items(recursive=True))

print(f"Number of items: {len(items)}")
for item in items:
    print(f"- {item.id}")
Number of items: 4
- LC80140332018166LGN00
- LC80150322018141LGN00
- LC80150332018189LGN00
- LC80300332018166LGN00

These IDs are not very descriptive; in the next section, we will take a look at how we can access the rich metadata associated with each Item.

Item Metadata#

Items can have a lot of metadata. This can be a bit overwhelming at first, but break the metadata fields down into a few categories:

  • Core Item Metadata

  • Common Metadata

  • STAC Extensions

We will walk through each of these metadata categories in the following sections.

First, let’s grab one of the Items using the Catalog.get_items method. We will use recursive=True to recursively crawl all child Catalogs and/or Collections to find the Item.

[7]:
item = next(root_catalog.get_items("LC80140332018166LGN00", recursive=True))

Core Item Metadata#

The core Item metadata fields include spatiotemporal information and the ID of the collection to which the Item belongs. These fields are all at the top level of the Item JSON and we can access them through attributes on the PySTAC Item instance.

[8]:
item.geometry
[8]:
{'type': 'Polygon',
 'coordinates': [[[-76.12180471942207, 39.95810181489563],
   [-73.94910518227414, 39.55117185146004],
   [-74.49564725552679, 37.826064511480496],
   [-76.66550404911956, 38.240699151776084],
   [-76.12180471942207, 39.95810181489563]]]}
[9]:
item.bbox
[9]:
[-76.66703, 37.82561, -73.94861, 39.95958]
[10]:
item.datetime
[10]:
datetime.datetime(2018, 6, 15, 15, 39, 9, tzinfo=tzutc())
[11]:
item.collection_id
[11]:
'landsat-8-l1'

If we want the actual Collection instance instead of just the ID, we can use the Item.get_collection method.

[12]:
item.get_collection()
[12]:

Common Metadata#

Certain fields that are commonly used in Items, but may also be found in other objects (e.g. Assets) are defined in the Common Metadata section of the spec. These include licensing and instrument information, descriptions of datetime ranges, and some other common fields. These properties can be found as attributes of the Item.common_metadata property, which is an instance of the CommonMetadata class.

[13]:
item.common_metadata.instruments
[13]:
['OLI_TIRS']
[14]:
item.common_metadata.platform
[14]:
'landsat-8'
[15]:
item.common_metadata.gsd
[15]:
30

STAC Extensions#

STAC Extensions are a mechanism for providing additional metadata not covered by the core STAC Spec. We can see which STAC Extensions are implemented by this particular Item by examining the list of extension URIs in the stac_extensions field.

[16]:
item.stac_extensions
[16]:
['https://stac-extensions.github.io/eo/v1.1.0/schema.json',
 'https://stac-extensions.github.io/view/v1.0.0/schema.json',
 'https://stac-extensions.github.io/projection/v1.1.0/schema.json']

This Item implements the Electro-Optical, View Geometry, and Projection Extensions.

We can also check if a specific extension is implemented using ext.has with the name of that extension.

[17]:
item.ext.has("eo")
[17]:
True
[18]:
item.ext.has("raster")
[18]:
False

We can access fields associated with the extension as attributes on the extension instance. For instance, the “eo:cloud_cover” field defined in the Electro-Optical Extension can be accessed using the item.ext.eo.cloud_cover attribute.

[19]:
item.ext.eo.cloud_cover
[19]:
22

We can also access the cloud cover field directly in the Item properties.

[20]:
item.properties["eo:cloud_cover"]
[20]:
22

We can access the Item’s assets through the assets attribute, which is a dictionary:

[21]:
for asset_key in item.assets:
    asset = item.assets[asset_key]
    print("{}: {} ({})".format(asset_key, asset.href, asset.media_type))
index: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/index.html (text/html)
thumbnail: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_thumb_large.jpg (image/jpeg)
B1: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B1.TIF (image/tiff)
B2: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B2.TIF (image/tiff)
B3: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B3.TIF (image/tiff)
B4: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B4.TIF (image/tiff)
B5: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B5.TIF (image/tiff)
B6: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B6.TIF (image/tiff)
B7: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B7.TIF (image/tiff)
B8: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B8.TIF (image/tiff)
B9: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B9.TIF (image/tiff)
B10: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B10.TIF (image/tiff)
B11: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B11.TIF (image/tiff)
ANG: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_ANG.txt (text/plain)
MTL: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_MTL.txt (text/plain)
BQA: https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_BQA.TIF (image/tiff)

We can use the to_dict() method to convert an Asset, or any PySTAC object, into a dictionary:

[22]:
asset = item.assets["B3"]
asset.to_dict()
[22]:
{'href': 'https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/014/033/LC08_L1TP_014033_20180615_20180703_01_T1/LC08_L1TP_014033_20180615_20180703_01_T1_B3.TIF',
 'type': 'image/tiff',
 'title': 'Band 3 (green)',
 'eo:bands': [{'name': 'B3',
   'full_width_half_max': 0.06,
   'center_wavelength': 0.56,
   'common_name': 'green'}],
 'roles': []}

Here we use the eo extension to get the band information for the asset:

[23]:
bands = asset.ext.eo.bands
bands
[23]:
[<Band name=B3>]
[24]:
bands[0].to_dict()
[24]:
{'name': 'B3',
 'full_width_half_max': 0.06,
 'center_wavelength': 0.56,
 'common_name': 'green'}

Writing STAC Objects#

We can also use PySTAC to create and/or update STAC objects and write them to disk. This Quickstart Tutorial will introduce you to some very basic concepts in writing STAC objects; for a more thorough tutorial, please see the “How to create STAC Catalogs” tutorial.

Suppose there was a mistake in the cloud cover value that we looked at earlier and that we would like to add a value for the instrument field, which is currently null. We can update these values using the same attributes and properties as before, then save the entire catalog to our local drive.

[25]:
new_catalog = root_catalog.clone()
[26]:
item_to_update = next(root_catalog.get_items("LC80140332018166LGN00", recursive=True))

# Update the cloud cover
item_to_update.ext.eo.cloud_cover = 30

# Add the instrument field
item_to_update.common_metadata.instruments = ["LANDSAT"]

Now we can examine the Item properties directly to verify that the changes have taken effect.

[27]:
print(f"New Cloud Cover: {item_to_update.properties['eo:cloud_cover']}")
print(f"New Instruments: {item_to_update.properties['instruments']}")
New Cloud Cover: 30
New Instruments: ['LANDSAT']

We will write this updated catalog to a temporary directory in our local drive using the Catalog.normalize_and_save method.

[28]:
# Create a temporary directory
tmp_dir = tempfile.mkdtemp()
[29]:
# Save the catalog and normalize all paths
new_catalog.normalize_and_save(tmp_dir)
print(f"Catalog saved to: {new_catalog.get_self_href()}")
Catalog saved to: /tmp/tmp9bmp70k9/catalog.json

We can open up Item that we just updated to verify that the new values were written to disk.

[30]:
item_path = Path(tmp_dir) / "landsat-8-l1" / "LC80140332018166LGN00" / ""

Finally, we clean up the temporary directory.

[31]:
shutil.rmtree(tmp_dir, ignore_errors=True)