Quickstart¶
This notebook shows how to use PySTAC to read through the public Sentinel catalog and write a local version.
Reading STAC¶
First, we want to hook into PySTAC to allow for reading of HTTP STAC items, as described in the STAC_IO Concepts docs.
Note: this requires the requests library be installed.
[1]:
from urllib.parse import urlparse
import requests
from pystac import STAC_IO
def requests_read_method(uri):
parsed = urlparse(uri)
if parsed.scheme.startswith('http'):
return requests.get(uri).text
else:
return STAC_IO.default_read_text_method(uri)
STAC_IO.read_text_method = requests_read_method
We can then read the STAC catalog located at the publicly available endpoint hosted by AWS:
[2]:
from pystac import Catalog
cat = Catalog.from_file('https://sentinel-stac.s3.amazonaws.com/catalog.json')
There are a lot of items in this catalog; crawling through it all would take a significant amount of time. Here, we lean on the fact that link resolution is lazy and get to a catalog that contains items:
[3]:
while len(cat.get_item_links()) == 0:
print('Crawling through {}'.format(cat))
cat = next(cat.get_children())
Crawling through <Catalog id=sentinel-stac>
Crawling through <Collection id=sentinel-2-l1c>
Crawling through <Catalog id=9>
Crawling through <Catalog id=V>
We can print some information about the catalog, including how many children it has:
[4]:
print(cat.description)
print('Contains {} items.'.format(len(cat.get_item_links())))
XK catalog
Contains 388 items.
Let’s grab the first item., check out it’s cloud cover, and start exploring the assets.
[5]:
item = next(cat.get_items())
You can access common metadata fields through the common_metadata property of the item:
[6]:
item.common_metadata.platform
[6]:
'sentinel-2b'
This particular stac item implements the eo extension extension. We can access the extension information through the “ext” property that’s part of every Catalog, Collection and Item. For instance, to get the cloud cover, we can use:
[7]:
item.ext.eo.cloud_cover
[7]:
41.52
we can see the cloud cover is in it’s appropriate key in the Item’s properties:
[8]:
item.properties['eo:cloud_cover']
[8]:
41.52
if we want to set the cloud cover, we can do that through the extension as well:
[9]:
item.ext.eo.cloud_cover = 42.0
item.properties['eo:cloud_cover']
[9]:
42.0
We can access the item’s assets through the assets property, which is a dictionary:
[10]:
for asset_key in item.assets:
asset = item.assets[asset_key]
print('{}: {} ({})'.format(asset_key, asset.href, asset.media_type))
thumbnail: https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/preview.jpg (None)
info: https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/tileInfo.json (None)
metadata: https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/metadata.xml (None)
tki: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/TKI.jp2 (image/jp2)
B01: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B01.jp2 (image/jp2)
B02: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B02.jp2 (image/jp2)
B03: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B03.jp2 (image/jp2)
B04: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B04.jp2 (image/jp2)
B05: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B05.jp2 (image/jp2)
B06: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B06.jp2 (image/jp2)
B07: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B07.jp2 (image/jp2)
B08: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B08.jp2 (image/jp2)
B8A: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B08.jp2 (image/jp2)
B09: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B09.jp2 (image/jp2)
B10: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B10.jp2 (image/jp2)
B11: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B11.jp2 (image/jp2)
B12: https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B11.jp2 (image/jp2)
We can use the to_dict()
method to convert an Asset, or any PySTAC object, into a dictionary:
[11]:
asset = item.assets['B03']
asset.to_dict()
[11]:
{'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/9/V/XK/2017/10/13/0/B03.jp2',
'type': 'image/jp2',
'title': 'Band 3 (green)',
'eo:bands': [{'name': 'B03',
'common_name': 'green',
'gsd': 10.0,
'center_wavelength': 0.56,
'full_width_half_max': 0.045}]}
Here we use the eo extension to get the band information for the asset:
[12]:
bands = item.ext.eo.get_bands(asset)
bands
[12]:
[<Band name=B03>]
[13]:
bands[0].to_dict()
[13]:
{'name': 'B03',
'common_name': 'green',
'gsd': 10.0,
'center_wavelength': 0.56,
'full_width_half_max': 0.045}
Writing a STAC¶
Let’s walk the catalog again, but this time create local clones of the STAC object, so we can end up with a copy that we can save off to the local file system.
[14]:
import itertools
cat = Catalog.from_file('https://sentinel-stac.s3.amazonaws.com/catalog.json')
# Setup the root of our local STAC
local_root = cat.clone()
local_root.clear_children()
# Loop over catalogs and clone
curr_local_cat = local_root
while len(cat.get_item_links()) == 0:
print('Crawling through {}'.format(cat))
cat = next(cat.get_children())
local_cat = cat.clone()
local_cat.clear_children()
curr_local_cat.add_child(local_cat)
curr_local_cat = local_cat
# Clear the items from the last local catalog
curr_local_cat.clear_children()
curr_local_cat.clear_items()
# Take the first 5 items
items = itertools.islice(cat.get_items(), 5)
# Clone and add them to our local catalog
curr_local_cat.add_items([i.clone() for i in items])
Crawling through <Catalog id=sentinel-stac>
Crawling through <Collection id=sentinel-2-l1c>
Crawling through <Catalog id=9>
Crawling through <Catalog id=V>
Now that we have a smaller STAC, let’s map over the items to reduce it even further by only including the thumbnail assets in our items:
[15]:
def item_mapper(item):
thumbnail_asset = item.assets['thumbnail']
new_assets = {
k:v
for k, v in item.assets.items()
if k == 'thumbnail'
}
item.assets = new_assets
return item
local_root_2 = local_root.map_items(item_mapper)
We can now normalize our catalog and save it somewhere local:
[16]:
!mkdir -p ./quickstart_stac
[17]:
local_root_2.normalize_hrefs('./quickstart_stac')
[17]:
<Catalog id=sentinel-stac>
[18]:
from pystac import CatalogType
local_root_2.save(catalog_type=CatalogType.SELF_CONTAINED)
[19]:
local_root_2.describe()
* <Catalog id=sentinel-stac>
* <Collection id=sentinel-2-l1c>
* <Catalog id=9>
* <Catalog id=V>
* <Catalog id=XK>
* <Item id=S2B_9VXK_20171013_0>
* <Item id=S2A_9VXK_20171015_0>
* <Item id=S2B_9VXK_20171016_0>
* <Item id=S2B_9VXK_20171017_0>
* <Item id=S2A_9VXK_20171002_0>
[20]:
for item in local_root_2.get_all_items():
print('Item {}:')
print(' Assets: {}'.format(item.assets))
Item {}:
Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/13/0/preview.jpg>}
Item {}:
Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/15/0/preview.jpg>}
Item {}:
Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/16/0/preview.jpg>}
Item {}:
Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/17/0/preview.jpg>}
Item {}:
Assets: {'thumbnail': <Asset href=https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/9/V/XK/2017/10/2/0/preview.jpg>}
Validating¶
If we have jsonschema
installed, either manually or via installing PySTAC with the optional validation dependencies installed:
> pip install pystac[validation]
we can validate our STAC objects to ensure we didn’t set any properties that break the spec:
[21]:
for root, _, items in local_root_2.walk():
root.validate()
for item in items:
item.validate()
If we have set something wrong, we’ll get a STACValidationException
when we validate:
[22]:
item = next(local_root_2.get_all_items())
item.bbox = ['Not a valid bbox']
item.validate()
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/validation/stac_validator.py in validate_core(self, stac_dict, stac_object_type, stac_version)
131 try:
--> 132 self._validate_from_uri(stac_dict, schema_uri)
133 return schema_uri
~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/validation/stac_validator.py in _validate_from_uri(self, stac_dict, schema_uri)
105 schema, resolver = self.get_schema_from_uri(schema_uri)
--> 106 jsonschema.validate(instance=stac_dict, schema=schema, resolver=resolver)
107 for uri in resolver.store:
~/proj/stac/pystac/venv/lib/python3.6/site-packages/jsonschema/validators.py in validate(instance, schema, cls, *args, **kwargs)
933 if error is not None:
--> 934 raise error
935
ValidationError: 'Not a valid bbox' is not of type 'number'
Failed validating 'type' in schema[0]['properties']['bbox']['items']:
{'type': 'number'}
On instance['bbox'][0]:
'Not a valid bbox'
The above exception was the direct cause of the following exception:
STACValidationError Traceback (most recent call last)
<ipython-input-22-09a436f4856e> in <module>
1 item = next(local_root_2.get_all_items())
2 item.bbox = ['Not a valid bbox']
----> 3 item.validate()
~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/stac_object.py in validate(self)
232 stac_object_type=self.STAC_OBJECT_TYPE,
233 stac_version=pystac.get_stac_version(),
--> 234 extensions=self.stac_extensions)
235
236 def get_root(self):
~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/validation/__init__.py in validate(stac_dict, stac_object_type, stac_version, extensions)
71
72 return RegisteredValidator.get_validator().validate(stac_dict, stac_object_type, stac_version,
---> 73 extensions)
74
75
~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/validation/stac_validator.py in validate(self, stac_dict, stac_object_type, stac_version, extensions)
64 """
65 results = []
---> 66 core_result = self.validate_core(stac_dict, stac_object_type, stac_version)
67 if core_result is not None:
68 results.append(core_result)
~/proj/stac/pystac/venv/lib/python3.6/site-packages/pystac-0.5.0rc1-py3.6.egg/pystac/validation/stac_validator.py in validate_core(self, stac_dict, stac_object_type, stac_version)
135 msg = 'Validation failed against schema at {} for STAC {}'.format(
136 schema_uri, stac_object_type)
--> 137 raise STACValidationError(msg, source=e) from e
138
139 def validate_extension(self, stac_dict, stac_object_type, stac_version, extension_id):
STACValidationError: Validation failed against schema at https://schemas.stacspec.org/v1.0.0-beta.2/item-spec/json-schema/item.json for STAC ITEM
[ ]: