PySTAC Introduction#

This tutorial includes a basic introduction on reading, writing, and creating STAC objects using Pystac.

It is adapted from the tutorials within the sat-stac repo.

It uses an example stac stored in the example-catalog directory along-side this notebook. The example stac has the following format:

example-catalog/
├── catalog.json
└── eo
    ├── catalog.json
    ├── landsat-8-l1
    │   ├── catalog.json
    │   └── item.json
    └── sentinel-2-l1c
        ├── catalog.json
        └── sentinel-2a
            ├── catalog.json
            └── item.json
[1]:
import sys
sys.path.append('..')
[2]:
import pystac

Working with existing catalogs#

Open a root catalog from it’s json file

[3]:
cat = pystac.Catalog.from_file('example-catalog/catalog.json')

We can see all elements of the STAC using the describe method

[4]:
cat.describe()
* <Catalog id=stac>
    * <Catalog id=stac-eo>
        * <Collection id=sentinel-2-l1c>
            * <Catalog id=sentinel-2a>
              * <Item id=L1C_T53MNQ_A017245_20181011T011722>
        * <Collection id=landsat-8-l1>
          * <Item id=LC08_L1GT_120046_20181012_20181012_01_RT>

Each STAC object has links that you can use to traverse the STAC tree

[5]:
cat.links
[5]:
[<Link rel=child target=<Catalog id=stac-eo>>,
 <Link rel=self target=/Users/rob/proj/stac/pystac/docs/tutorials/example-catalog/catalog.json>,
 <Link rel=root target=<Catalog id=stac>>]

Pystac has several methods that allow you to access links:

[6]:
# Get all child links
cat.get_child_links()
[6]:
[<Link rel=child target=<Catalog id=stac-eo>>]
[7]:
# Get a single link by 'rel'
cat.get_single_link('self')
[7]:
<Link rel=self target=/Users/rob/proj/stac/pystac/docs/tutorials/example-catalog/catalog.json>
[8]:
# Get item links directly within this catalog (there are none for this catalog)
cat.get_item_links()
[8]:
[]

or the items directly:

[9]:
# get all child objects
list(cat.get_children())
[9]:
[<Catalog id=stac-eo>]
[10]:
# or a single child by id
cat.get_child('stac-eo')
[10]:
<Catalog id=stac-eo>
[11]:
# get all items anywhere below this catalog on the STAC tree
list(cat.get_all_items())
[11]:
[<Item id=L1C_T53MNQ_A017245_20181011T011722>,
 <Item id=LC08_L1GT_120046_20181012_20181012_01_RT>]

You can access the stac item from a link using the target property

[12]:
l = cat.get_single_link('child')
print(l)
<Link rel=child target=<Catalog id=stac-eo>>
[13]:
print(l.target)
<Catalog id=stac-eo>

You can convert any stac item to a python dict using the to_dict method.

[14]:
cat.to_dict(include_self_link=False)
[14]:
{'id': 'stac',
 'stac_version': '1.0.0-beta.2',
 'description': 'A STAC of public datasets',
 'links': [{'rel': 'child', 'href': './eo/catalog.json'},
  {'rel': 'root', 'href': './catalog.json', 'type': 'application/json'}]}
[15]:
# get first (and only in this case) sub-catalog
subcat = next(cat.get_children())
[16]:
# print some IDs
print("Root Catalog: ", cat.id)
print("Sub Catalog: ", subcat.id)
print("Sub Catalog parent: ", subcat.get_parent().id)

# iterate through child catalogs of the sub-catalog
print("Sub Catalog children:")
for child in subcat.get_children():
    print('    ', child.id)
Root Catalog:  stac
Sub Catalog:  stac-eo
Sub Catalog parent:  stac
Sub Catalog children:
     sentinel-2-l1c
     landsat-8-l1
[17]:
print('\n**Items**')
for i in cat.get_all_items():
    print(i.id)

**Items**
L1C_T53MNQ_A017245_20181011T011722
LC08_L1GT_120046_20181012_20181012_01_RT

Creating new catalogs#

You can initialize a new Catalog with an id and a description. Note that by default it sets a new catalog as root.

[18]:
# create a Catalog object with JSON
mycat = pystac.Catalog(id = "mycat", description= "My shiny new STAC catalog")
[19]:
mycat.links
[19]:
[<Link rel=root target=<Catalog id=mycat>>]

Adding catalogs to catalogs#

[20]:
# add a new catalog to a root catalog
kitten = pystac.Catalog(id = "mykitten", description="A child catalog of my shiny new STAC catalog")

When you add a child catalog to a parent catalog, the child catalog assumes the root catalog of it’s parent. ‘Child’ and ‘parent’ links are also added to the parent and child catalogs, respectively.

[21]:
kitten.links
[21]:
[<Link rel=root target=<Catalog id=mykitten>>]
[22]:
mycat.add_child(kitten)
[23]:
kitten.links
[23]:
[<Link rel=root target=<Catalog id=mycat>>,
 <Link rel=parent target=<Catalog id=mycat>>]
[24]:
mycat.links
[24]:
[<Link rel=root target=<Catalog id=mycat>>,
 <Link rel=child target=<Catalog id=mykitten>>]
[25]:
mycat.describe()
* <Catalog id=mycat>
    * <Catalog id=mykitten>

Adding collections to catalogs#

In the next two steps we will work with Pystac Collections and Items. We will pull them out of our example catalog and add them to the new STAC that we have created.

Collections are Catalogs but also include spatial and temporal extents as well as additional properties.

[26]:
# open the Landsat collection
collection = pystac.Collection.from_file('example-catalog/eo/landsat-8-l1/catalog.json')
print('Collection name: ', collection.id)
Collection name:  landsat-8-l1

See the spatial and temporal extent of this collection

[27]:
collection.extent.to_dict()
[27]:
{'spatial': {'bbox': [[-180, -90, 180, 90]]},
 'temporal': {'interval': [['2013-06-01T00:00:00Z', None]]}}
[28]:
collection.links
[28]:
[<Link rel=root target=../../catalog.json>,
 <Link rel=parent target=../catalog.json>,
 <Link rel=item target=item.json>,
 <Link rel=self target=/Users/rob/proj/stac/pystac/docs/tutorials/example-catalog/eo/landsat-8-l1/catalog.json>]
[29]:
# add it to the child catalog created above
kitten.add_child(collection)
[30]:
collection.links
[30]:
[<Link rel=item target=item.json>,
 <Link rel=self target=/Users/rob/proj/stac/pystac/docs/tutorials/example-catalog/eo/landsat-8-l1/catalog.json>,
 <Link rel=root target=<Catalog id=mycat>>,
 <Link rel=parent target=<Catalog id=mykitten>>]

Adding items to collection#

Items are stac objects whose parents can be either Catalogs or Collections. They also have spatio-temporal information and assets. Assets point directly to the data included in the STAC.

[31]:
# open a Landsat item
item = pystac.read_file('example-catalog/eo/sentinel-2-l1c/sentinel-2a/item.json')
print('Item name: ', item.id)
Item name:  L1C_T53MNQ_A017245_20181011T011722
[32]:
item.links
[32]:
[<Link rel=root target=../../../catalog.json>,
 <Link rel=parent target=catalog.json>,
 <Link rel=collection target=../catalog.json>,
 <Link rel=self target=/Users/rob/proj/stac/pystac/docs/tutorials/example-catalog/eo/sentinel-2-l1c/sentinel-2a/item.json>]
[33]:
item.assets
[33]:
{'B01': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B01.jp2>,
 'B02': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B02.jp2>,
 'B03': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B03.jp2>,
 'B04': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B04.jp2>,
 'B05': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B05.jp2>,
 'B06': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B06.jp2>,
 'B07': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B07.jp2>,
 'B08': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B08.jp2>,
 'B09': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B09.jp2>,
 'B10': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B10.jp2>,
 'B11': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B11.jp2>,
 'B12': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B12.jp2>,
 'B8A': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/B8A.jp2>,
 'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/53/M/NQ/2018/10/11/0/preview.jpg>,
 'tki': <Asset href=https://sentinel-s2-l1c.s3.amazonaws.com/tiles/53/M/NQ/2018/10/11/0/TKI.jp2>,
 'metadata': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/53/M/NQ/2018/10/11/0/metadata.xml>}
[34]:
# add it to the collection created above
collection.add_item(item)
[35]:
# now look at the catalog we've created
mycat.describe()
* <Catalog id=mycat>
    * <Catalog id=mykitten>
        * <Collection id=landsat-8-l1>
          * <Item id=LC08_L1GT_120046_20181012_20181012_01_RT>
          * <Item id=L1C_T53MNQ_A017245_20181011T011722>

Currently, this STAC only exists in memory. We can use normalize_and_save to save off the STAC with the cononical “absolute published” form:

[36]:
mycat.normalize_and_save('pystac-example-absolute',
                         catalog_type=pystac.CatalogType.ABSOLUTE_PUBLISHED)

Notice now that the ‘parent’ link of an item is a absolute HREF:

[37]:
item = next(mycat.get_all_items())
item.get_single_link('parent').get_href()
[37]:
'/Users/rob/proj/stac/pystac/docs/tutorials/pystac-example-absolute/mykitten/landsat-8-l1/collection.json'

We can also normalize and save the catalog to the other types described in the best practices documentation: “relative published” and “self contained”. A self contained catalog contains all relative links, and no self links. Notice how saving a self contained catalog will produce relative links:

[38]:
mycat.normalize_and_save('pystac-example-relative',
                         catalog_type=pystac.CatalogType.SELF_CONTAINED)
[39]:
item = next(mycat.get_all_items())
item.get_single_link('parent').get_href()
[39]:
'../collection.json'
[ ]: