{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Create and manipulate SpaceNet Vegas STAC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial shows how to create and manipulate STACs using pystac. It contains two parts:\n", "\n", "1. Create SpaceNet Vegas STAC\n", " - Create (in memory) a pystac catalog of [SpaceNet 2 imagery from the Las Vegas AOI](https://spacenetchallenge.github.io/AOI_Lists/AOI_2_Vegas.html) using data hosted in a public s3 bucket\n", " - Set relative paths for all STAC object\n", " - Normalize links from a root directory and save the STAC there\n", " \n", " \n", "2. Create a new STAC with COGs and labels\n", " - Create a STAC of a sample of SpaceNet Vegas images from s3\n", " - Save the STAC locally\n", " - Download labels for and create COGs of each images\n", " - Save the geojson labels and COGs locall\n", " - Create an updated STAC that points to the new files and only includes labeled scenes\n", " - Set relative paths for all STAC object\n", " - Normalize links from a root directory and save the STAC there" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import sys\n", "\n", "sys.path.append(\"..\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may need install the following packages that are not included in the Python 3 standard library. If you do not have any of these installed, you can do do with pip:\n", "\n", "[boto3](https://pypi.org/project/boto3/): `pip install boto3` \n", "[botocore](https://pypi.org/project/botocore/): `pip install botocore` \n", "[rasterio](https://pypi.org/project/rasterio/): `pip install rasterio` \n", "[shapely](https://pypi.org/project/Shapely/): `pip install Shapely` \n", "[rio-cogeo](https://github.com/cogeotiff/rio-cogeo): `pip install rio-cogeo`" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import json\n", "from datetime import datetime\n", "from os.path import basename, dirname, join\n", "from subprocess import call\n", "\n", "import boto3\n", "import rasterio\n", "from botocore.errorfactory import ClientError\n", "import pystac\n", "from pystac.extensions import label\n", "from shapely.geometry import GeometryCollection, Polygon, box, shape, mapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create SpaceNet Vegas STAC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Initialize a STAC for the SpaceNet 2 dataset" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "spacenet = pystac.Catalog(id=\"spacenet\", description=\"SpaceNet 2 STAC\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We do not yet know the spatial extent of the Vegas AOI. We will need to determine it when we download all of the images. As a placeholder we will create a spatial extent of null values." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "sp_extent = pystac.SpatialExtent([None, None, None, None])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The capture date for SpaceNet 2 Vegas imagery is October 22, 2015. Create a python datetime object using that date" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "capture_date = datetime.strptime(\"2015-10-22\", \"%Y-%m-%d\")\n", "tmp_extent = pystac.TemporalExtent([(capture_date, None)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create an Extent object that will define both the spatial and temporal extents of the Vegas collection" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "extent = pystac.Extent(sp_extent, tmp_extent)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a collection that will encompass the Vegas data and add to the spacenet catalog" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "vegas = pystac.Collection(\n", " id=\"vegas\", description=\"Vegas SpaceNet 2 dataset\", extent=extent\n", ")\n", "spacenet.add_child(vegas)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* \n", " * \n" ] } ], "source": [ "spacenet.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Find the locations of SpaceNet images. In order to make this example quicker, we will limit the number of scenes that we use to 10." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "client = boto3.client(\"s3\")\n", "scenes = client.list_objects(\n", " Bucket=\"spacenet-dataset\",\n", " Prefix=\"spacenet/SN2_buildings/train/AOI_2_Vegas/PS-RGB/\",\n", " MaxKeys=20,\n", ")\n", "scenes = [s[\"Key\"] for s in scenes[\"Contents\"] if s[\"Key\"].endswith(\".tif\")][0:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each scene, create and item with a defined bounding box. Each item will include the geotiff as an asset. We will add labels in the next section." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "for scene in scenes:\n", " uri = join(\"s3://spacenet-dataset/\", scene)\n", " params = {}\n", " params[\"id\"] = basename(uri).split(\".\")[0]\n", " with rasterio.open(uri) as src:\n", " params[\"bbox\"] = list(src.bounds)\n", " params[\"geometry\"] = mapping(box(*params[\"bbox\"]))\n", " params[\"datetime\"] = capture_date\n", " params[\"properties\"] = {}\n", " i = pystac.Item(**params)\n", " i.add_asset(\n", " key=\"image\",\n", " asset=pystac.Asset(\n", " href=uri, title=\"Geotiff\", media_type=pystac.MediaType.GEOTIFF\n", " ),\n", " )\n", " vegas.add_item(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now reset the spatial extent of the Vegas collection using the geometry objects from from the items we just added." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "bounds = [\n", " list(\n", " GeometryCollection([shape(s.geometry) for s in spacenet.get_all_items()]).bounds\n", " )\n", "]\n", "vegas.extent.spatial = pystac.SpatialExtent(bounds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently, this STAC only exists in memory. We need to set all of the paths based on the root directory we want to save off that catalog too, and then save a \"self contained\" catalog, which will have all links be relative and contain no 'self' links. We can do this by using the `normalize` method to set the HREFs of all of our STAC objects. We'll then validate the catalog, and then save:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spacenet.normalize_hrefs(\"spacenet-stac\")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "spacenet.validate_all()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "spacenet.save(catalog_type=pystac.CatalogType.SELF_CONTAINED)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create new STAC with COGs and labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this step, we will add `Items` with the `Label` extension to each scene Item. We will download these from the SpaceNet s3 bucket and save them locally. We will also create COGs of each geotiff and save them to the same directory as the labels." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can map over each item in a catalog using the map_items method. This method takes a user-specified function (item_mapper) and maps it over all items within a copy of the catalog. It returns the altered catalog. The item_mapper function must take an item and return either another item or a list of items.\n", "\n", "The item mapper defined below downloads the appropriate label geojson and creates a LabelItem that points to the local file. It also creates a COG of the original image and saves it off to the same directory that the labels live in before updating the image to reference the COG rather than the original tiff." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def item_to_labels_url(item):\n", " image_uri = item.assets[\"image\"].href\n", " d = dirname(image_uri).replace(\"PS-RGB\", \"geojson_buildings\")\n", " b = (\n", " basename(image_uri)\n", " .replace(\"PS-RGB\", \"geojson_buildings\")\n", " .replace(\".tif\", \".geojson\")\n", " )\n", " return join(d, b)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "def cogify_and_label(item, data_dir):\n", " label_url = item_to_labels_url(item)\n", " s3 = boto3.client(\"s3\")\n", "\n", " try:\n", " label_uri = join(data_dir, basename(label_url))\n", " s3.download_file(\n", " \"spacenet-dataset\",\n", " label_url.replace(\"s3://spacenet-dataset/\", \"\"),\n", " label_uri,\n", " )\n", "\n", " # construct label item\n", " label_item = pystac.Item(\n", " id=\"{}-labels\".format(item.id),\n", " bbox=item.bbox,\n", " geometry=mapping(box(*item.bbox)),\n", " datetime=item.datetime,\n", " properties={},\n", " stac_extensions=[pystac.Extensions.LABEL],\n", " )\n", "\n", " label_item.ext.label.apply(\n", " label_description=\"Building labels for scene {}\".format(item.id),\n", " label_type=label.LabelType.VECTOR,\n", " label_properties=[\"partialBuilding\"],\n", " # Label classes is marked as required in 1.0.0-beta.2, so make it up.\n", " # Once this PR is released, this can be removed:\n", " # https://github.com/radiantearth/stac-spec/pull/905\n", " label_classes=[\n", " label.LabelClasses.create(classes=[\"building\"], name=\"partialBuilding\")\n", " ],\n", " )\n", "\n", " label_item.ext.label.add_geojson_labels(href=label_uri)\n", "\n", " output_cog_uri = join(data_dir, \"{}-cog.tif\".format(item.id))\n", " call(\n", " \" \".join(\n", " [\"rio\", \"cogeo\", \"create\", item.assets[\"image\"].href, output_cog_uri]\n", " ),\n", " shell=True,\n", " )\n", " item.assets[\"image\"].href = output_cog_uri\n", " print(\"Completed item: {}\".format(item.id))\n", " return [item, label_item]\n", " except ClientError:\n", " print(\"Labels not available for item {}\".format(item.id))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a 'data' directory to put all of the output COGs and labels into" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "!mkdir -p ./data" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1\n", "Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img10\n", "Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1002\n", "Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1003\n", "Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1004\n", "Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1006\n", "Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1007\n", "Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1009\n", "Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img101\n", "Completed item: SN2_buildings_train_AOI_2_Vegas_PS-RGB_img1010\n" ] } ], "source": [ "mapper = lambda item: cogify_and_label(item, data_dir=\"data\")\n", "spacenet_cog = spacenet.map_items(mapper)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n", " * \n" ] } ], "source": [ "spacenet_cog.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally we set a different root uri for this STAC and save it locally" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spacenet_cog.normalize_hrefs(\"spacenet-cog-stac\")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "spacenet_cog.validate_all()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "spacenet_cog.save(catalog_type=pystac.CatalogType.SELF_CONTAINED)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 4 }