Create and manipulate SpaceNet Vegas STAC#
This tutorial shows how to create and manipulate STACs using pystac. It contains two parts:
Create SpaceNet Vegas STAC
Create (in memory) a pystac catalog of SpaceNet 2 imagery from the Las Vegas AOI using data hosted in a public s3 bucket
Set relative paths for all STAC object
Normalize links from a root directory and save the STAC there
Create a new STAC with COGs and labels
Create a STAC of a sample of SpaceNet Vegas images from s3
Save the STAC locally
Download labels for and create COGs of each images
Save the geojson labels and COGs locall
Create an updated STAC that points to the new files and only includes labeled scenes
Set relative paths for all STAC object
Normalize links from a root directory and save the STAC there
[1]:
import sys
sys.path.append("..")
You may need install the following packages that are not included in the Python 3 standard library. If you do not have any of these installed, you can do do with pip:
pip install boto3
pip install botocore
pip install rasterio
pip install Shapely
pip install rio-cogeo
[2]:
import json
from datetime import datetime
from os.path import basename, dirname, join
from subprocess import call
import boto3
import rasterio
from botocore.errorfactory import ClientError
import pystac
from pystac.extensions import label
from shapely.geometry import GeometryCollection, Polygon, box, shape, mapping
Create SpaceNet Vegas STAC#
Initialize a STAC for the SpaceNet 2 dataset
[3]:
spacenet = pystac.Catalog(id="spacenet", description="SpaceNet 2 STAC")
We do not yet know the spatial extent of the Vegas AOI. We will need to determine it when we download all of the images. As a placeholder we will create a spatial extent of null values.
[4]:
sp_extent = pystac.SpatialExtent([None, None, None, None])
The capture date for SpaceNet 2 Vegas imagery is October 22, 2015. Create a python datetime object using that date
[5]:
capture_date = datetime.strptime("2015-10-22", "%Y-%m-%d")
tmp_extent = pystac.TemporalExtent([(capture_date, None)])
Create an Extent object that will define both the spatial and temporal extents of the Vegas collection
[6]:
extent = pystac.Extent(sp_extent, tmp_extent)
Create a collection that will encompass the Vegas data and add to the spacenet catalog
[7]:
vegas = pystac.Collection(
id="vegas", description="Vegas SpaceNet 2 dataset", extent=extent
)
spacenet.add_child(vegas)
[8]:
spacenet.describe()
* <Catalog id=spacenet>
* <Collection id=vegas>
Find the locations of SpaceNet images. In order to make this example quicker, we will limit the number of scenes that we use to 10.
[9]:
client = boto3.client("s3")
scenes = client.list_objects(
Bucket="spacenet-dataset",
Prefix="spacenet/SN2_buildings/train/AOI_2_Vegas/PS-RGB/",
MaxKeys=20,
)
scenes = [s["Key"] for s in scenes["Contents"] if s["Key"].endswith(".tif")][0:10]
For each scene, create and item with a defined bounding box. Each item will include the geotiff as an asset. We will add labels in the next section.
[10]:
for scene in scenes:
uri = join("s3://spacenet-dataset/", scene)
params = {}
params["id"] = basename(uri).split(".")[0]
with rasterio.open(uri) as src:
params["bbox"] = list(src.bounds)
params["geometry"] = mapping(box(*params["bbox"]))
params["datetime"] = capture_date
params["properties"] = {}
i = pystac.Item(**params)
i.add_asset(
key="image",
asset=pystac.Asset(
href=uri, title="Geotiff", media_type=pystac.MediaType.GEOTIFF
),
)
vegas.add_item(i)
Now reset the spatial extent of the Vegas collection using the geometry objects from from the items we just added.
[11]:
bounds = [
list(
GeometryCollection(
[shape(s.geometry) for s in spacenet.get_items(recursive=True)]
).bounds
)
]
vegas.extent.spatial = pystac.SpatialExtent(bounds)
Currently, this STAC only exists in memory. We need to set all of the paths based on the root directory we want to save off that catalog too, and then save a “self contained” catalog, which will have all links be relative and contain no ‘self’ links. We can do this by using the normalize
method to set the HREFs of all of our STAC objects. We’ll then validate the catalog, and then save:
[12]:
spacenet.normalize_hrefs("spacenet-stac")
[12]:
<Catalog id=spacenet>
[13]:
spacenet.validate_all()
[14]:
spacenet.save(catalog_type=pystac.CatalogType.SELF_CONTAINED)
Create new STAC with COGs and labels#
In this step, we will add Items
with the Label
extension to each scene Item. We will download these from the SpaceNet s3 bucket and save them locally. We will also create COGs of each geotiff and save them to the same directory as the labels.
You can map over each item in a catalog using the map_items method. This method takes a user-specified function (item_mapper) and maps it over all items within a copy of the catalog. It returns the altered catalog. The item_mapper function must take an item and return either another item or a list of items.
The item mapper defined below downloads the appropriate label geojson and creates a LabelItem that points to the local file. It also creates a COG of the original image and saves it off to the same directory that the labels live in before updating the image to reference the COG rather than the original tiff.
[15]:
def item_to_labels_url(item):
image_uri = item.assets["image"].href
d = dirname(image_uri).replace("PS-RGB", "geojson_buildings")
b = (
basename(image_uri)
.replace("PS-RGB", "geojson_buildings")
.replace(".tif", ".geojson")
)
return join(d, b)
[16]:
def cogify_and_label(item, data_dir):
label_url = item_to_labels_url(item)
s3 = boto3.client("s3")
try:
label_uri = join(data_dir, basename(label_url))
s3.download_file(
"spacenet-dataset",
label_url.replace("s3://spacenet-dataset/", ""),
label_uri,
)
# construct label item
label_item = pystac.Item(
id="{}-labels".format(item.id),
bbox=item.bbox,
geometry=mapping(box(*item.bbox)),
datetime=item.datetime,
properties={},
stac_extensions=[pystac.Extensions.LABEL],
)
label_item.ext.label.apply(
label_description="Building labels for scene {}".format(item.id),
label_type=label.LabelType.VECTOR,
label_properties=["partialBuilding"],
# Label classes is marked as required in 1.0.0-beta.2, so make it up.
# Once this PR is released, this can be removed:
# https://github.com/radiantearth/stac-spec/pull/905
label_classes=[
label.LabelClasses.create(classes=["building"], name="partialBuilding")
],
)
label_item.ext.label.add_geojson_labels(href=label_uri)
output_cog_uri = join(data_dir, "{}-cog.tif".format(item.id))
call(
" ".join(
["rio", "cogeo", "create", item.assets["image"].href, output_cog_uri]
),
shell=True,
)
item.assets["image"].href = output_cog_uri
print("Completed item: {}".format(item.id))
return [item, label_item]
except ClientError:
print("Labels not available for item {}".format(item.id))
Create a ‘data’ directory to put all of the output COGs and labels into
[17]:
!mkdir -p ./data
[18]:
mapper = lambda item: cogify_and_label(item, data_dir="data")
spacenet_cog = spacenet.map_items(mapper)
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img10
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1002
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1003
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1004
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1006
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1007
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1009
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img101
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1010
[19]:
spacenet_cog.describe()
* <Catalog id=spacenet>
* <Collection id=vegas>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1-labels>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img10>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img10-labels>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1002>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1002-labels>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1003>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1003-labels>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1004>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1004-labels>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1006>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1006-labels>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1007>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1007-labels>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1009>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1009-labels>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img101>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img101-labels>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1010>
* <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1010-labels>
Finally we set a different root uri for this STAC and save it locally
[20]:
spacenet_cog.normalize_hrefs("spacenet-cog-stac")
[20]:
<Catalog id=spacenet>
[21]:
spacenet_cog.validate_all()
[22]:
spacenet_cog.save(catalog_type=pystac.CatalogType.SELF_CONTAINED)
[ ]: