Quickstart

This notebook shows how to use PySTAC to read through the public Sentinel catalog and write a local version.

Reading STAC

First, we want to hook into PySTAC to allow for reading of HTTP STAC items, as described in the STAC_IO Concepts docs.

Note: this requires the requests library be installed.

[1]:
from urllib.parse import urlparse
import requests
from pystac import STAC_IO

def requests_read_method(uri):
    parsed = urlparse(uri)
    if parsed.scheme.startswith('http'):
        return requests.get(uri).text
    else:
        return STAC_IO.default_read_text_method(uri)

STAC_IO.read_text_method = requests_read_method

We can then read the STAC catalog located at the publicly available endpoint hosted by AWS:

[3]:
from pystac import Catalog

cat = Catalog.from_file('https://sentinel-stac.s3.amazonaws.com/catalog.json')

There are a lot of items in this catalog; crawling through it all would take a significant amount of time. Here, we lean on the fact that link resolution is lazy and get to a catalog that contains items:

[4]:
while len(cat.get_item_links()) == 0:
    print('Crawling through {}'.format(cat))
    cat = next(cat.get_children())
Crawling through <Catalog id=sentinel-stac>
Crawling through <Collection id=sentinel-2-l1c>
Crawling through <Catalog id=9>
Crawling through <Catalog id=V>

We can print some information about the catalog, including how many children it has:

[8]:
print(cat.description)
print('Contains {} items.'.format(len(cat.get_item_links())))
XK catalog
Contains 388 items.

Let’s grab the first item, check out it’s cloud cover, and start exploring the assets.

[9]:
item = next(cat.get_items())
[10]:
item.cloud_cover
[10]:
41.52
[11]:
for asset_key in item.assets:
    asset = item.assets[asset_key]
    print('{}: {} ({})'.format(asset_key, asset.href, asset.media_type))
thumbnail: https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/preview.jpg (None)
info: https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/tileInfo.json (None)
metadata: https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/metadata.xml (None)
tki: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/TKI.jp2 (image/jp2)
B01: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B01.jp2 (image/jp2)
B02: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B02.jp2 (image/jp2)
B03: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B03.jp2 (image/jp2)
B04: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B04.jp2 (image/jp2)
B05: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B05.jp2 (image/jp2)
B06: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B06.jp2 (image/jp2)
B07: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B07.jp2 (image/jp2)
B08: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B08.jp2 (image/jp2)
B8A: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B08.jp2 (image/jp2)
B09: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B09.jp2 (image/jp2)
B10: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B10.jp2 (image/jp2)
B11: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B11.jp2 (image/jp2)
B12: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B11.jp2 (image/jp2)

We can use the to_dict() method to convert an Asset, or any PySTAC object, into a dictionary:

[12]:
asset = item.assets['B03']
asset.to_dict()
[12]:
{'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B03.jp2',
 'type': 'image/jp2',
 'title': 'Band 3 (green)',
 'eo:bands': [2]}

Here the asset uses the band information associated with it’s item:

[13]:
asset.get_bands()[0].to_dict()
[13]:
{'name': 'B03',
 'common_name': 'green',
 'gsd': 10.0,
 'center_wavelength': 0.56,
 'full_width_half_max': 0.045}

Writing a STAC

Let’s walk the catalog again, but this time create local clones of the STAC object, so we can end up with a copy that we can save off to the local file system.

[52]:
import itertools

cat = Catalog.from_file('https://sentinel-stac.s3.amazonaws.com/catalog.json')

# Setup the root of our local STAC
local_root = cat.clone()
local_root.clear_children()

# Loop over catalogs and clone
curr_local_cat = local_root
while len(cat.get_item_links()) == 0:
    print('Crawling through {}'.format(cat))
    cat = next(cat.get_children())
    local_cat = cat.clone()
    local_cat.clear_children()
    curr_local_cat.add_child(local_cat)
    curr_local_cat = local_cat

# Clear the items from the last local catalog
curr_local_cat.clear_children()
curr_local_cat.clear_items()

# Take the first 5 items
items = itertools.islice(cat.get_items(), 5)

# Clone and add them to our local catalog
curr_local_cat.add_items([i.clone() for i in items])
Crawling through <Catalog id=sentinel-stac>
Crawling through <Collection id=sentinel-2-l1c>
Crawling through <Catalog id=9>
Crawling through <Catalog id=V>

Now that we have a smaller STAC, let’s map over the items to reduce it even further by only including the thumbnail assets in our items:

[53]:
def item_mapper(item):
    thumbnail_asset = item.assets['thumbnail']

    #
    new_assets = { 'thumbnail': item.assets['thumbnail'] }
    item.assets = new_assets
    return item

local_root_2 = local_root.map_items(item_mapper)

We can now normalize our catalog and save it somewhere local:

[61]:
!mkdir -p ./quickstart_stac
[55]:
local_root_2.normalize_hrefs('./quickstart_stac')
[55]:
<Catalog id=sentinel-stac>
[56]:
from pystac import CatalogType

local_root_2.save(catalog_type=CatalogType.SELF_CONTAINED)
[60]:
local_root_2.describe()
* <Catalog id=sentinel-stac>
    * <Collection id=sentinel-2-l1c>
        * <Catalog id=9>
            * <Catalog id=V>
                * <Catalog id=XK>
                  * <EOItem id=S2B_9VXK_20171013_0>
                  * <EOItem id=S2A_9VXK_20171015_0>
                  * <EOItem id=S2B_9VXK_20171016_0>
                  * <EOItem id=S2B_9VXK_20171017_0>
                  * <EOItem id=S2A_9VXK_20171002_0>
[64]:
for item in local_root_2.get_all_items():
    print('Item {}:')
    print('  Assets: {}'.format(item.assets))
Item {}:
  Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/preview.jpg>}
Item {}:
  Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/15/0/preview.jpg>}
Item {}:
  Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/16/0/preview.jpg>}
Item {}:
  Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/17/0/preview.jpg>}
Item {}:
  Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/2/0/preview.jpg>}