Create and manipulate SpaceNet Vegas STAC#

This tutorial shows how to create and manipulate STACs using pystac. It contains two parts:

  1. Create SpaceNet Vegas STAC

    • Create (in memory) a pystac catalog of SpaceNet 2 imagery from the Las Vegas AOI using data hosted in a public s3 bucket

    • Set relative paths for all STAC object

    • Normalize links from a root directory and save the STAC there

  2. Create a new STAC with COGs and labels

    • Create a STAC of a sample of SpaceNet Vegas images from s3

    • Save the STAC locally

    • Download labels for and create COGs of each images

    • Save the geojson labels and COGs locall

    • Create an updated STAC that points to the new files and only includes labeled scenes

    • Set relative paths for all STAC object

    • Normalize links from a root directory and save the STAC there

[1]:
import sys

sys.path.append("..")

You may need install the following packages that are not included in the Python 3 standard library. If you do not have any of these installed, you can do do with pip:

boto3: pip install boto3
botocore: pip install botocore
rasterio: pip install rasterio
shapely: pip install Shapely
rio-cogeo: pip install rio-cogeo
[2]:
import json
from datetime import datetime
from os.path import basename, dirname, join
from subprocess import call

import boto3
import rasterio
from botocore.errorfactory import ClientError
import pystac
from pystac.extensions import label
from shapely.geometry import GeometryCollection, Polygon, box, shape, mapping

Create SpaceNet Vegas STAC#

Initialize a STAC for the SpaceNet 2 dataset

[3]:
spacenet = pystac.Catalog(id="spacenet", description="SpaceNet 2 STAC")

We do not yet know the spatial extent of the Vegas AOI. We will need to determine it when we download all of the images. As a placeholder we will create a spatial extent of null values.

[4]:
sp_extent = pystac.SpatialExtent([None, None, None, None])

The capture date for SpaceNet 2 Vegas imagery is October 22, 2015. Create a python datetime object using that date

[5]:
capture_date = datetime.strptime("2015-10-22", "%Y-%m-%d")
tmp_extent = pystac.TemporalExtent([(capture_date, None)])

Create an Extent object that will define both the spatial and temporal extents of the Vegas collection

[6]:
extent = pystac.Extent(sp_extent, tmp_extent)

Create a collection that will encompass the Vegas data and add to the spacenet catalog

[7]:
vegas = pystac.Collection(
    id="vegas", description="Vegas SpaceNet 2 dataset", extent=extent
)
spacenet.add_child(vegas)
[8]:
spacenet.describe()
* <Catalog id=spacenet>
    * <Collection id=vegas>

Find the locations of SpaceNet images. In order to make this example quicker, we will limit the number of scenes that we use to 10.

[9]:
client = boto3.client("s3")
scenes = client.list_objects(
    Bucket="spacenet-dataset",
    Prefix="spacenet/SN2_buildings/train/AOI_2_Vegas/PS-RGB/",
    MaxKeys=20,
)
scenes = [s["Key"] for s in scenes["Contents"] if s["Key"].endswith(".tif")][0:10]

For each scene, create and item with a defined bounding box. Each item will include the geotiff as an asset. We will add labels in the next section.

[10]:
for scene in scenes:
    uri = join("s3://spacenet-dataset/", scene)
    params = {}
    params["id"] = basename(uri).split(".")[0]
    with rasterio.open(uri) as src:
        params["bbox"] = list(src.bounds)
        params["geometry"] = mapping(box(*params["bbox"]))
    params["datetime"] = capture_date
    params["properties"] = {}
    i = pystac.Item(**params)
    i.add_asset(
        key="image",
        asset=pystac.Asset(
            href=uri, title="Geotiff", media_type=pystac.MediaType.GEOTIFF
        ),
    )
    vegas.add_item(i)

Now reset the spatial extent of the Vegas collection using the geometry objects from from the items we just added.

[11]:
bounds = [
    list(
        GeometryCollection(
            [shape(s.geometry) for s in spacenet.get_items(recursive=True)]
        ).bounds
    )
]
vegas.extent.spatial = pystac.SpatialExtent(bounds)

Currently, this STAC only exists in memory. We need to set all of the paths based on the root directory we want to save off that catalog too, and then save a “self contained” catalog, which will have all links be relative and contain no ‘self’ links. We can do this by using the normalize method to set the HREFs of all of our STAC objects. We’ll then validate the catalog, and then save:

[12]:
spacenet.normalize_hrefs("spacenet-stac")
[12]:
<Catalog id=spacenet>
[13]:
spacenet.validate_all()
[14]:
spacenet.save(catalog_type=pystac.CatalogType.SELF_CONTAINED)

Create new STAC with COGs and labels#

In this step, we will add Items with the Label extension to each scene Item. We will download these from the SpaceNet s3 bucket and save them locally. We will also create COGs of each geotiff and save them to the same directory as the labels.

You can map over each item in a catalog using the map_items method. This method takes a user-specified function (item_mapper) and maps it over all items within a copy of the catalog. It returns the altered catalog. The item_mapper function must take an item and return either another item or a list of items.

The item mapper defined below downloads the appropriate label geojson and creates a LabelItem that points to the local file. It also creates a COG of the original image and saves it off to the same directory that the labels live in before updating the image to reference the COG rather than the original tiff.

[15]:
def item_to_labels_url(item):
    image_uri = item.assets["image"].href
    d = dirname(image_uri).replace("PS-RGB", "geojson_buildings")
    b = (
        basename(image_uri)
        .replace("PS-RGB", "geojson_buildings")
        .replace(".tif", ".geojson")
    )
    return join(d, b)
[16]:
def cogify_and_label(item, data_dir):
    label_url = item_to_labels_url(item)
    s3 = boto3.client("s3")

    try:
        label_uri = join(data_dir, basename(label_url))
        s3.download_file(
            "spacenet-dataset",
            label_url.replace("s3://spacenet-dataset/", ""),
            label_uri,
        )

        # construct label item
        label_item = pystac.Item(
            id="{}-labels".format(item.id),
            bbox=item.bbox,
            geometry=mapping(box(*item.bbox)),
            datetime=item.datetime,
            properties={},
            stac_extensions=[pystac.Extensions.LABEL],
        )

        label_item.ext.label.apply(
            label_description="Building labels for scene {}".format(item.id),
            label_type=label.LabelType.VECTOR,
            label_properties=["partialBuilding"],
            # Label classes is marked as required in 1.0.0-beta.2, so make it up.
            # Once this PR is released, this can be removed:
            # https://github.com/radiantearth/stac-spec/pull/905
            label_classes=[
                label.LabelClasses.create(classes=["building"], name="partialBuilding")
            ],
        )

        label_item.ext.label.add_geojson_labels(href=label_uri)

        output_cog_uri = join(data_dir, "{}-cog.tif".format(item.id))
        call(
            " ".join(
                ["rio", "cogeo", "create", item.assets["image"].href, output_cog_uri]
            ),
            shell=True,
        )
        item.assets["image"].href = output_cog_uri
        print("Completed item: {}".format(item.id))
        return [item, label_item]
    except ClientError:
        print("Labels not available for item {}".format(item.id))

Create a ‘data’ directory to put all of the output COGs and labels into

[17]:
!mkdir -p ./data
[18]:
mapper = lambda item: cogify_and_label(item, data_dir="data")
spacenet_cog = spacenet.map_items(mapper)
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img10
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1002
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1003
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1004
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1006
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1007
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1009
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img101
Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1010
[19]:
spacenet_cog.describe()
* <Catalog id=spacenet>
    * <Collection id=vegas>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1-labels>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img10>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img10-labels>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1002>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1002-labels>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1003>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1003-labels>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1004>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1004-labels>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1006>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1006-labels>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1007>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1007-labels>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1009>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1009-labels>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img101>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img101-labels>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1010>
      * <Item id=SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1010-labels>

Finally we set a different root uri for this STAC and save it locally

[20]:
spacenet_cog.normalize_hrefs("spacenet-cog-stac")
[20]:
<Catalog id=spacenet>
[21]:
spacenet_cog.validate_all()
[22]:
spacenet_cog.save(catalog_type=pystac.CatalogType.SELF_CONTAINED)
[ ]: