Quickstart

This notebook shows how to use PySTAC to read through the public Sentinel catalog and write a local version.

Reading STAC

First, we want to hook into PySTAC to allow for reading of HTTP STAC items, as described in the STAC_IO Concepts docs.

Note: this requires the requests library be installed.

[1]:
from urllib.parse import urlparse
import requests
from pystac import STAC_IO

def requests_read_method(uri):
    parsed = urlparse(uri)
    if parsed.scheme.startswith('http'):
        return requests.get(uri).text
    else:
        return STAC_IO.default_read_text_method(uri)

STAC_IO.read_text_method = requests_read_method

We can then read the STAC catalog located at the publicly available endpoint hosted by AWS:

[2]:
from pystac import Catalog

cat = Catalog.from_file('https://sentinel-stac.s3.amazonaws.com/catalog.json')

There are a lot of items in this catalog; crawling through it all would take a significant amount of time. Here, we lean on the fact that link resolution is lazy and get to a catalog that contains items:

[3]:
while len(cat.get_item_links()) == 0:
    print('Crawling through {}'.format(cat))
    cat = next(cat.get_children())
Crawling through <Catalog id=sentinel-stac>
Crawling through <Collection id=sentinel-2-l1c>
Crawling through <Catalog id=9>
Crawling through <Catalog id=V>

We can print some information about the catalog, including how many children it has:

[4]:
print(cat.description)
print('Contains {} items.'.format(len(cat.get_item_links())))
XK catalog
Contains 388 items.

Let’s grab the first item., check out it’s cloud cover, and start exploring the assets.

[5]:
item = next(cat.get_items())

You can access common metadata fields through the common_metadata property of the item:

[6]:
item.common_metadata.platform
[6]:
'sentinel-2b'

This particular stac item implements the eo extension extension. We can access the extension information through the “ext” property that’s part of every Catalog, Collection and Item. For instance, to get the cloud cover, we can use:

[7]:
item.ext.eo.cloud_cover
[7]:
41.52

we can see the cloud cover is in it’s appropriate key in the Item’s properties:

[8]:
item.properties['eo:cloud_cover']
[8]:
41.52

if we want to set the cloud cover, we can do that through the extension as well:

[9]:
item.ext.eo.cloud_cover = 42.0
item.properties['eo:cloud_cover']
[9]:
42.0

We can access the item’s assets through the assets property, which is a dictionary:

[10]:
for asset_key in item.assets:
    asset = item.assets[asset_key]
    print('{}: {} ({})'.format(asset_key, asset.href, asset.media_type))
thumbnail: https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/preview.jpg (None)
info: https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/tileInfo.json (None)
metadata: https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/metadata.xml (None)
tki: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/TKI.jp2 (image/jp2)
B01: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B01.jp2 (image/jp2)
B02: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B02.jp2 (image/jp2)
B03: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B03.jp2 (image/jp2)
B04: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B04.jp2 (image/jp2)
B05: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B05.jp2 (image/jp2)
B06: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B06.jp2 (image/jp2)
B07: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B07.jp2 (image/jp2)
B08: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B08.jp2 (image/jp2)
B8A: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B08.jp2 (image/jp2)
B09: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B09.jp2 (image/jp2)
B10: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B10.jp2 (image/jp2)
B11: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B11.jp2 (image/jp2)
B12: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B11.jp2 (image/jp2)

We can use the to_dict() method to convert an Asset, or any PySTAC object, into a dictionary:

[11]:
asset = item.assets['B03']
asset.to_dict()
[11]:
{'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B03.jp2',
 'type': 'image/jp2',
 'title': 'Band 3 (green)',
 'eo:bands': [{'name': 'B03',
   'common_name': 'green',
   'gsd': 10.0,
   'center_wavelength': 0.56,
   'full_width_half_max': 0.045}]}

Here we use the eo extension to get the band information for the asset:

[12]:
bands = item.ext.eo.get_bands(asset)
bands
[12]:
[<Band name=B03>]
[13]:
bands[0].to_dict()
[13]:
{'name': 'B03',
 'common_name': 'green',
 'gsd': 10.0,
 'center_wavelength': 0.56,
 'full_width_half_max': 0.045}

Writing a STAC

Let’s walk the catalog again, but this time create local clones of the STAC object, so we can end up with a copy that we can save off to the local file system.

[14]:
import itertools

cat = Catalog.from_file('https://sentinel-stac.s3.amazonaws.com/catalog.json')

# Setup the root of our local STAC
local_root = cat.clone()
local_root.clear_children()

# Loop over catalogs and clone
curr_local_cat = local_root
while len(cat.get_item_links()) == 0:
    print('Crawling through {}'.format(cat))
    cat = next(cat.get_children())
    local_cat = cat.clone()
    local_cat.clear_children()
    curr_local_cat.add_child(local_cat)
    curr_local_cat = local_cat

# Clear the items from the last local catalog
curr_local_cat.clear_children()
curr_local_cat.clear_items()

# Take the first 5 items
items = itertools.islice(cat.get_items(), 5)

# Clone and add them to our local catalog
curr_local_cat.add_items([i.clone() for i in items])
Crawling through <Catalog id=sentinel-stac>
Crawling through <Collection id=sentinel-2-l1c>
Crawling through <Catalog id=9>
Crawling through <Catalog id=V>

Now that we have a smaller STAC, let’s map over the items to reduce it even further by only including the thumbnail assets in our items:

[15]:
def item_mapper(item):
    thumbnail_asset = item.assets['thumbnail']

    new_assets = {
        k:v
        for k, v in item.assets.items()
        if k == 'thumbnail'
    }
    item.assets = new_assets
    return item

local_root_2 = local_root.map_items(item_mapper)

We can now normalize our catalog and save it somewhere local:

[16]:
!mkdir -p ./quickstart_stac
[17]:
local_root_2.normalize_hrefs('./quickstart_stac')
[17]:
<Catalog id=sentinel-stac>
[18]:
from pystac import CatalogType

local_root_2.save(catalog_type=CatalogType.SELF_CONTAINED)
[19]:
local_root_2.describe()
* <Catalog id=sentinel-stac>
    * <Collection id=sentinel-2-l1c>
        * <Catalog id=9>
            * <Catalog id=V>
                * <Catalog id=XK>
                  * <Item id=S2B_9VXK_20171013_0>
                  * <Item id=S2A_9VXK_20171015_0>
                  * <Item id=S2B_9VXK_20171016_0>
                  * <Item id=S2B_9VXK_20171017_0>
                  * <Item id=S2A_9VXK_20171002_0>
[20]:
for item in local_root_2.get_all_items():
    print('Item {}:')
    print('  Assets: {}'.format(item.assets))
Item {}:
  Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/preview.jpg>}
Item {}:
  Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/15/0/preview.jpg>}
Item {}:
  Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/16/0/preview.jpg>}
Item {}:
  Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/17/0/preview.jpg>}
Item {}:
  Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/2/0/preview.jpg>}

Validating

If we have jsonschema installed, either manually or via installing PySTAC with the optional validation dependencies installed:

> pip install pystac[validation]

we can validate our STAC objects to ensure we didn’t set any properties that break the spec:

[21]:
for root, _, items in local_root_2.walk():
    root.validate()
    for item in items:
        item.validate()

If we have set something wrong, we’ll get a STACValidationException when we validate:

[22]:
item = next(local_root_2.get_all_items())
item.bbox = ['Not a valid bbox']
item.validate()
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/validation/stac_validator.py in validate_core(self, stac_dict, stac_object_type, stac_version)
    131         try:
--> 132             self._validate_from_uri(stac_dict, schema_uri)
    133             return schema_uri

~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/validation/stac_validator.py in _validate_from_uri(self, stac_dict, schema_uri)
    105         schema, resolver = self.get_schema_from_uri(schema_uri)
--> 106         jsonschema.validate(instance=stac_dict, schema=schema, resolver=resolver)
    107         for uri in resolver.store:

~/proj/stac/pystac/venv/lib/python3.6/site-packages/jsonschema/validators.py in validate(instance, schema, cls, *args, **kwargs)
    933     if error is not None:
--> 934         raise error
    935

ValidationError: 'Not a valid bbox' is not of type 'number'

Failed validating 'type' in schema[0]['properties']['bbox']['items']:
    {'type': 'number'}

On instance['bbox'][0]:
    'Not a valid bbox'

The above exception was the direct cause of the following exception:

STACValidationError                       Traceback (most recent call last)
<ipython-input-22-09a436f4856e> in <module>
      1 item = next(local_root_2.get_all_items())
      2 item.bbox = ['Not a valid bbox']
----> 3 item.validate()

~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/stac_object.py in validate(self)
    232                                           stac_object_type=self.STAC_OBJECT_TYPE,
    233                                           stac_version=pystac.get_stac_version(),
--> 234                                           extensions=self.stac_extensions)
    235
    236     def get_root(self):

~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/validation/__init__.py in validate(stac_dict, stac_object_type, stac_version, extensions)
     71
     72     return RegisteredValidator.get_validator().validate(stac_dict, stac_object_type, stac_version,
---> 73                                                         extensions)
     74
     75

~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/validation/stac_validator.py in validate(self, stac_dict, stac_object_type, stac_version, extensions)
     64         """
     65         results = []
---> 66         core_result = self.validate_core(stac_dict, stac_object_type, stac_version)
     67         if core_result is not None:
     68             results.append(core_result)

~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/validation/stac_validator.py in validate_core(self, stac_dict, stac_object_type, stac_version)
    135             msg = 'Validation failed against schema at {} for STAC {}'.format(
    136                 schema_uri, stac_object_type)
--> 137             raise STACValidationError(msg, source=e) from e
    138
    139     def validate_extension(self, stac_dict, stac_object_type, stac_version, extension_id):

STACValidationError: Validation failed against schema at https://schemas.stacspec.org/v1.0.0-beta.2/item-spec/json-schema/item.json for STAC ITEM
[ ]: