PySTAC Introduction#

This tutorial includes a basic introduction on reading, writing, and creating STAC objects using Pystac.

It is adapted from the tutorials within the sat-stac repo.

It uses an example stac stored in the ../example-catalog directory along-side this notebook. The example stac has the following format:

../example-catalog
├── catalog.json
└── landsat-8-l1
    ├── 2018-05
    │   └── LC80150322018141LGN00.json
    ├── 2018-06
    │   ├── LC80140332018166LGN00.json
    │   └── LC80300332018166LGN00.json
    ├── 2018-07
    │   └── LC80150332018189LGN00.json
    └── collection.json
[1]:
import pystac

Working with existing catalogs#

Open a root catalog from it’s json file

[2]:
cat = pystac.Catalog.from_file("../example-catalog/catalog.json")

We can see all elements of the STAC using the describe method

[3]:
cat.describe()
* <Catalog id=landsat-stac-collection-catalog>
    * <Collection id=landsat-8-l1>
      * <Item id=LC80140332018166LGN00>
      * <Item id=LC80150322018141LGN00>
      * <Item id=LC80150332018189LGN00>
      * <Item id=LC80300332018166LGN00>

Each STAC object has links that you can use to traverse the STAC tree

[4]:
cat.links
[4]:
[<Link rel=self target=/home/jsignell/pystac/docs/example-catalog/catalog.json>,
 <Link rel=root target=<Catalog id=landsat-stac-collection-catalog>>,
 <Link rel=child target=<Collection id=landsat-8-l1>>]

Pystac has several methods that allow you to access links:

[5]:
# Get all child links
cat.get_child_links()
[5]:
[<Link rel=child target=<Collection id=landsat-8-l1>>]

or the children directly:

[6]:
list(cat.get_children())
[6]:
[<Collection id=landsat-8-l1>]
[7]:
# or a single child by id
cat.get_child("landsat-8-l1")
[7]:
[8]:
# Get a single link by 'rel'
cat.get_single_link("self")
[8]:
[9]:
# Get item links directly within this catalog (there are none for this catalog)
cat.get_item_links()
[9]:
[]

or the items directly:

[10]:
# get item objects
list(cat.get_items())
[10]:
[]
[11]:
# get all items anywhere below this catalog on the STAC tree
list(cat.get_items(recursive=True))
[11]:
[<Item id=LC80140332018166LGN00>,
 <Item id=LC80150322018141LGN00>,
 <Item id=LC80150332018189LGN00>,
 <Item id=LC80300332018166LGN00>]

You can access the stac item from a link using the target property

[12]:
l = cat.get_single_link("child")
print(l)
<Link rel=child target=<Collection id=landsat-8-l1>>
[13]:
print(l.target)
<Collection id=landsat-8-l1>

You can convert any stac item to a python dict using the to_dict method.

[14]:
cat.to_dict(include_self_link=False)
[14]:
{'type': 'Catalog',
 'id': 'landsat-stac-collection-catalog',
 'stac_version': '1.0.0',
 'description': 'STAC for Landsat data',
 'links': [{'rel': 'root',
   'href': './catalog.json',
   'type': 'application/json',
   'title': 'STAC for Landsat data'},
  {'rel': 'child',
   'href': './landsat-8-l1/collection.json',
   'title': 'Landsat 8 L1'}],
 'title': 'STAC for Landsat data'}
[15]:
# get first (and only in this case) sub-catalog
subcat = next(cat.get_children())
[16]:
# print some IDs
print("Root Catalog: ", cat.id)
print("Sub Catalog: ", subcat.id)
print("Sub Catalog parent: ", subcat.get_parent().id)

# iterate through child catalogs of the sub-catalog
print("Sub Catalog children:")
for child in subcat.get_children():
    print("    ", child.id)
Root Catalog:  landsat-stac-collection-catalog
Sub Catalog:  landsat-8-l1
Sub Catalog parent:  landsat-stac-collection-catalog
Sub Catalog children:
[17]:
print("\n**Items**")
for i in cat.get_items(recursive=True):
    print(i.id)

**Items**
LC80140332018166LGN00
LC80150322018141LGN00
LC80150332018189LGN00
LC80300332018166LGN00

Creating new catalogs#

You can initialize a new Catalog with an id and a description. Note that by default it sets a new catalog as root.

[18]:
# create a Catalog object with JSON
mycat = pystac.Catalog(id="mycat", description="My shiny new STAC catalog")
[19]:
mycat.links
[19]:
[<Link rel=root target=<Catalog id=mycat>>]

Adding catalogs to catalogs#

[20]:
# add a new catalog to a root catalog
kitten = pystac.Catalog(
    id="mykitten", description="A child catalog of my shiny new STAC catalog"
)

When you add a child catalog to a parent catalog, the child catalog assumes the root catalog of it’s parent. ‘Child’ and ‘parent’ links are also added to the parent and child catalogs, respectively.

[21]:
kitten.links
[21]:
[<Link rel=root target=<Catalog id=mykitten>>]
[22]:
mycat.add_child(kitten)
[22]:
[23]:
kitten.links
[23]:
[<Link rel=root target=<Catalog id=mycat>>,
 <Link rel=parent target=<Catalog id=mycat>>]
[24]:
mycat.links
[24]:
[<Link rel=root target=<Catalog id=mycat>>,
 <Link rel=child target=<Catalog id=mykitten>>]
[25]:
mycat.describe()
* <Catalog id=mycat>
    * <Catalog id=mykitten>

Adding collections to catalogs#

In the next two steps we will work with Pystac Collections and Items. We will pull them out of our example catalog and add them to the new STAC that we have created.

Collections are Catalogs but also include spatial and temporal extents as well as additional properties.

[26]:
# open the Landsat collection
collection = pystac.Collection.from_file(
    "../example-catalog/landsat-8-l1/collection.json"
)
collection
[26]:

See the spatial and temporal extent of this collection

[27]:
collection.extent.to_dict()
[27]:
{'spatial': {'bbox': [[-180.0, -90.0, 180.0, 90.0]]},
 'temporal': {'interval': [['2018-05-21T15:44:59Z', '2018-07-08T15:45:34Z']]}}
[28]:
collection.links
[28]:
[<Link rel=self target=/home/jsignell/pystac/docs/example-catalog/landsat-8-l1/collection.json>,
 <Link rel=root target=../catalog.json>,
 <Link rel=parent target=../catalog.json>,
 <Link rel=item target=./2018-06/LC80140332018166LGN00.json>,
 <Link rel=item target=./2018-05/LC80150322018141LGN00.json>,
 <Link rel=item target=./2018-07/LC80150332018189LGN00.json>,
 <Link rel=item target=./2018-06/LC80300332018166LGN00.json>]
[29]:
# add it to the child catalog created above
kitten.add_child(collection)
[29]:
[30]:
collection.links
[30]:
[<Link rel=self target=/home/jsignell/pystac/docs/example-catalog/landsat-8-l1/collection.json>,
 <Link rel=root target=<Catalog id=mycat>>,
 <Link rel=item target=./2018-06/LC80140332018166LGN00.json>,
 <Link rel=item target=./2018-05/LC80150322018141LGN00.json>,
 <Link rel=item target=./2018-07/LC80150332018189LGN00.json>,
 <Link rel=item target=./2018-06/LC80300332018166LGN00.json>,
 <Link rel=parent target=<Catalog id=mykitten>>]

Adding items to collection#

Items are stac objects whose parents can be either Catalogs or Collections. They also have spatio-temporal information and assets. Assets point directly to the data included in the STAC.

[31]:
# open a Landsat item
item = pystac.read_file(
    "../example-catalog/landsat-8-l1/2018-05/LC80150322018141LGN00.json"
)
item
[31]:
[32]:
item.links
[32]:
[<Link rel=self target=/home/jsignell/pystac/docs/example-catalog/landsat-8-l1/2018-05/LC80150322018141LGN00.json>,
 <Link rel=parent target=../collection.json>,
 <Link rel=collection target=../collection.json>,
 <Link rel=root target=../../catalog.json>]
[33]:
item.assets
[33]:
{'index': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/index.html>,
 'thumbnail': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_thumb_large.jpg>,
 'B1': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_B1.TIF>,
 'B2': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_B2.TIF>,
 'B3': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_B3.TIF>,
 'B4': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_B4.TIF>,
 'B5': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_B5.TIF>,
 'B6': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_B6.TIF>,
 'B7': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_B7.TIF>,
 'B8': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_B8.TIF>,
 'B9': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_B9.TIF>,
 'B10': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_B10.TIF>,
 'B11': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_B11.TIF>,
 'ANG': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_ANG.txt>,
 'MTL': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_MTL.txt>,
 'BQA': <Asset href=https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/015/032/LC08_L1TP_015032_20180521_20180605_01_T1/LC08_L1TP_015032_20180521_20180605_01_T1_BQA.TIF>}
[34]:
# add it to the collection created above
collection.add_item(item)
[34]:
[35]:
# now look at the catalog we've created
mycat.describe()
* <Catalog id=mycat>
    * <Catalog id=mykitten>
        * <Collection id=landsat-8-l1>
          * <Item id=LC80140332018166LGN00>
          * <Item id=LC80150322018141LGN00>
          * <Item id=LC80150332018189LGN00>
          * <Item id=LC80300332018166LGN00>
          * <Item id=LC80150322018141LGN00>

Currently, this STAC only exists in memory. We can use normalize_and_save to save off the STAC with the canonical “absolute published” form:

[36]:
mycat.normalize_and_save(
    "pystac-example-absolute", catalog_type=pystac.CatalogType.ABSOLUTE_PUBLISHED
)

Notice now that the ‘parent’ link of an item is a absolute HREF:

[37]:
item = next(mycat.get_items(recursive=True))
item.get_single_link("parent").get_href()
[37]:
'/home/jsignell/pystac/docs/tutorials/pystac-example-absolute/mykitten/landsat-8-l1/collection.json'

We can also normalize and save the catalog to the other types described in the best practices documentation: “relative published” and “self contained”. A self contained catalog contains all relative links, and no self links. Notice how saving a self contained catalog will produce relative links:

[38]:
mycat.normalize_and_save(
    "pystac-example-relative", catalog_type=pystac.CatalogType.SELF_CONTAINED
)
[39]:
item = next(mycat.get_items(recursive=True))
item.get_single_link("parent").get_href()
[39]:
'../collection.json'