pystac.catalog#

class pystac.catalog.Catalog(id: str, description: str, title: str | None = None, stac_extensions: list[str] | None = None, extra_fields: dict[str, Any] | None = None, href: str | None = None, catalog_type: CatalogType = CatalogType.ABSOLUTE_PUBLISHED, strategy: HrefLayoutStrategy | None = None)[source]

A PySTAC Catalog represents a STAC catalog in memory.

A Catalog is a STACObject that may contain children, which are instances of Catalog or Collection, as well as Item s.

Parameters:
  • id – Identifier for the catalog. Must be unique within the STAC.

  • description – Detailed multi-line description to fully explain the catalog. CommonMark 0.29 syntax MAY be used for rich text representation.

  • title – Optional short descriptive one-line title for the catalog.

  • stac_extensions – Optional list of extensions the Catalog implements.

  • href – Optional HREF for this catalog, which be set as the catalog’s self link’s HREF.

  • catalog_type – Optional catalog type for this catalog. Must be one of the values in CatalogType.

  • strategy – The layout strategy to use for setting the HREFs of the catalog child objections and items. If not provided, it will default to the strategy of the root and fallback to BestPracticesLayoutStrategy.

DEFAULT_FILE_NAME = 'catalog.json'

Default file name that will be given to this STAC object in a canonical format.

STAC_OBJECT_TYPE: STACObjectType = 'Catalog'
add_child(child: Catalog | Collection, title: str | None = None, strategy: HrefLayoutStrategy | None = None, set_parent: bool = True) Link[source]

Adds a link to a child Catalog or Collection.

This method will set the child’s parent to this object and potentially override its self_link (unless set_parent is False).

It will always set its root to this Catalog’s root.

Parameters:
  • child – The child to add.

  • title – Optional title to give to the Link

  • strategy – The layout strategy to use for setting the self href of the child. If not provided, defaults to the layout strategy of the parent or root and falls back to BestPracticesLayoutStrategy.

  • set_parent – Whether to set the parent on the child as well. Defaults to True.

Returns:

The link created for the child

Return type:

Link

add_children(children: Iterable[Catalog | Collection], strategy: HrefLayoutStrategy | None = None) list[Link][source]

Adds links to multiple Catalog or ~pystac.Collection objects. This method will set each child’s parent to this object, and their root to this Catalog’s root.

Parameters:
  • children – The children to add.

  • strategy – The layout strategy to use for setting the self href of the children. If not provided, defaults to the layout strategy of the parent or root and falls back to BestPracticesLayoutStrategy.

Returns:

An array of links created for the children

Return type:

List[Link]

add_item(item: Item, title: str | None = None, strategy: HrefLayoutStrategy | None = None, set_parent: bool = True) Link[source]

Adds a link to an Item.

This method will set the item’s parent to this object and potentially override its self_link (unless set_parent is False)

It will always set its root to this Catalog’s root.

Parameters:
  • item – The item to add.

  • title – Optional title to give to the Link

  • strategy – The layout strategy to use for setting the self href of the item. If not provided, defaults to the layout strategy of the parent or root and falls back to BestPracticesLayoutStrategy.

  • set_parent – Whether to set the parent on the item as well. Defaults to True.

Returns:

The link created for the item

Return type:

Link

add_items(items: Iterable[Item], strategy: HrefLayoutStrategy | None = None) list[Link][source]

Adds links to multiple Items.

This method will set each item’s parent to this object, and their root to this Catalog’s root.

Parameters:
  • items – The items to add.

  • strategy – The layout strategy to use for setting the self href of the items. If not provided, defaults to the layout strategy of the parent or root and falls back to BestPracticesLayoutStrategy.

Returns:

A list of links created for the item

Return type:

List[Link]

catalog_type: CatalogType

The catalog type. Defaults to CatalogType.ABSOLUTE_PUBLISHED.

clear_children() None[source]

Removes all children from this catalog.

Returns:

Returns self

Return type:

Catalog

clear_items() None[source]

Removes all items from this catalog.

Returns:

Returns self

Return type:

Catalog

clone() Catalog[source]

Clones this object.

Cloning an object will make a copy of all properties and links of the object; however, it will not make copies of the targets of links (i.e. it is not a deep copy). To copy a STACObject fully, with all linked elements also copied, use STACObject.full_copy.

Returns:

The clone of this object.

Return type:

STACObject

describe(include_hrefs: bool = False, _indent: int = 0) None[source]

Prints out information about this Catalog and all contained STACObjects.

Parameters:

include_hrefs (bool) – HREF along with the object ID.

description: str

Detailed multi-line description to fully explain the catalog.

property ext: CatalogExt

Accessor for extension classes on this catalog

Example:

print(collection.ext.version)
extra_fields: dict[str, Any]

Extra fields that are part of the top-level JSON properties of the Catalog.

classmethod from_dict(d: dict[str, Any], href: str | None = None, root: Catalog | None = None, migrate: bool = False, preserve_dict: bool = True) C[source]

Parses this STACObject from the passed in dictionary.

Parameters:
  • d – The dict to parse.

  • href – Optional href that is the file location of the object being parsed.

  • root – Optional root catalog for this object. If provided, the root of the returned STACObject will be set to this parameter.

  • migrate – Use True if this dict represents JSON from an older STAC object, so that migrations are run against it.

  • preserve_dict – If False, the dict parameter d may be modified during this method call. Otherwise the dict is not mutated. Defaults to True, which results results in a deepcopy of the parameter. Set to False when possible to avoid the performance hit of a deepcopy.

Returns:

The STACObject parsed from this dict.

Return type:

STACObject

classmethod from_file(href: HREF, stac_io: pystac.StacIO | None = None) C[source]

Reads a STACObject implementation from a file.

Parameters:
  • href – The HREF to read the object from.

  • stac_io – Optional instance of StacIO to use. If not provided, will use the default instance.

Returns:

The specific STACObject implementation class that is represented by the JSON read from the file located at HREF.

full_copy(root: Catalog | None = None, parent: Catalog | None = None) Catalog[source]

Create a full copy of this STAC object and any stac objects linked to by this object.

Parameters:
  • root – Optional root to set as the root of the copied object, and any other copies that are contained by this object.

  • parent – Optional parent to set as the parent of the copy of this object.

Returns:

A full copy of this object, as well as any objects this object links to.

Return type:

STACObject

fully_resolve() None[source]

Resolves every link in this catalog.

Useful if, e.g., you’d like to read a catalog from a filesystem, upgrade every object in the catalog to the latest STAC version, and save it back to the filesystem. By default, save() skips unresolved links.

generate_subcatalogs(template: str, defaults: dict[str, Any] | None = None, parent_ids: list[str] | None = None) list[Catalog][source]

Walks through the catalog and generates subcatalogs for items based on the template string.

See LayoutTemplate for details on the construction of template strings. This template string will be applied to the items, and subcatalogs will be created that separate and organize the items based on template values.

Parameters:
  • template – A template string that can be consumed by a LayoutTemplate

  • defaults – Default values for the template variables that will be used if the property cannot be found on the item.

  • parent_ids – Optional list of the parent catalogs’ identifiers. If the bottom-most subcatalogs already match the template, no subcatalog is added.

Returns:

List of new catalogs created

Return type:

[catalog]

get_all_collections() Iterable[Collection][source]

Get all collections from this catalog and all subcatalogs. Will traverse any subcatalogs recursively.

get_all_items() Iterator[Item][source]

DEPRECATED.

Deprecated since version 1.8: Use pystac.Catalog.get_items(recursive=True) instead.

Get all items from this catalog and all subcatalogs. Will traverse any subcatalogs recursively.

Returns:

All items that belong to this catalog, and all

catalogs or collections connected to this catalog through child links.

Return type:

Generator[Item]

get_child(id: str, recursive: bool = False, sort_links_by_id: bool = True) Catalog | Collection | None[source]

Gets the child of this catalog with the given ID, if it exists.

Parameters:
  • id – The ID of the child to find.

  • recursive – If True, search this catalog and all children for the item; otherwise, only search the children of this catalog. Defaults to False.

  • sort_links_by_id – If True, links containing the ID will be checked first. If links do not contain the ID then setting this to False will improve performance. Defaults to True.

Returns:

The child with the given ID, or None if not found.

Return type:

Optional Catalog or Collection

get_child_links() list[pystac.link.Link][source]

Return all child links of this catalog.

Returns:

List of links of this catalog with rel == 'child'

Return type:

List[Link]

get_children() Iterable[Catalog | Collection][source]

Return all children of this catalog.

Returns:

Iterable of children who’s parent is this catalog.

Return type:

Iterable[Catalog or Collection]

get_collections() Iterable[Collection][source]

Return all children of this catalog that are Collection instances.

get_item(id: str, recursive: bool = False) Item | None[source]

DEPRECATED.

Deprecated since version 1.8: Use next(pystac.Catalog.get_items(id), None) instead.

Returns an item with a given ID.

Parameters:
  • id – The ID of the item to find.

  • recursive – If True, search this catalog and all children for the item; otherwise, only search the items of this catalog. Defaults to False.

Returns:

The item with the given ID, or None if not found.

Return type:

Item or None

get_item_links() list[pystac.link.Link][source]

Return all item links of this catalog.

Returns:

List of links of this catalog with rel == 'item'

Return type:

List[Link]

get_items(*ids: str, recursive: bool = False) Iterator[Item][source]

Return all items or specific items of this catalog.

Parameters:
  • *ids – The IDs of the items to include.

  • recursive – If True, search this catalog and all children for the item; otherwise, only search the items of this catalog. Defaults to False.

Returns:

Generator of items whose parent is this catalog, and

(if recursive) all catalogs or collections connected to this catalog through child links.

Return type:

Iterator[Item]

id: str

Identifier for the catalog.

is_relative() bool[source]
links: list[Link]

A list of Link objects representing all links associated with this Catalog.

make_all_asset_hrefs_absolute() None[source]

Recursively makes all the HREFs of assets in this catalog absolute

make_all_asset_hrefs_relative() None[source]

Recursively makes all the HREFs of assets in this catalog relative

map_assets(asset_mapper: Callable[[str, Asset], Asset | tuple[str, Asset] | dict[str, Asset]]) Catalog[source]

Creates a copy of a catalog, with each Asset for each Item passed through the asset_mapper function.

Parameters:

asset_mapper – A function that takes in an key and an Asset, and returns either an Asset, a (key, Asset), or a dictionary of Assets with unique keys. The Asset that is passed into the item_mapper is a copy, so the method can mutate it safely.

Returns:

A full copy of this catalog, with assets manipulated according to the asset_mapper function.

Return type:

Catalog

map_items(item_mapper: Callable[[Item], Item | list[Item]]) Catalog[source]

Creates a copy of a catalog, with each item passed through the item_mapper function.

Parameters:

item_mapper – A function that takes in an item, and returns either an item or list of items. The item that is passed into the item_mapper is a copy, so the method can mutate it safely.

Returns:

A full copy of this catalog, with items manipulated according to the item_mapper function.

Return type:

Catalog

classmethod matches_object_type(d: dict[str, Any]) bool[source]

Returns a boolean indicating whether the given dictionary represents a valid instance of this STACObject sub-class.

Parameters:

d – A dictionary to identify

normalize_and_save(root_href: str, catalog_type: CatalogType | None = None, strategy: HrefLayoutStrategy | None = None, stac_io: pystac.StacIO | None = None, skip_unresolved: bool = False) None[source]

Normalizes link HREFs to the given root_href, and saves the catalog.

This is a convenience method that simply calls Catalog.normalize_hrefs and Catalog.save in sequence.

Parameters:
  • root_href – The absolute HREF that all links will be normalized against.

  • catalog_type – The catalog type that dictates the structure of the catalog to save. Use a member of CatalogType. Defaults to the root catalog.catalog_type or the current catalog catalog_type if there is no root catalog.

  • strategy – The layout strategy to use in setting the HREFS for this catalog. If not provided, defaults to the layout strategy of the parent or root and falls back to BestPracticesLayoutStrategy

  • stac_io – Optional instance of StacIO to use. If not provided, will use the instance set while reading in the catalog, or the default instance if this is not available.

  • skip_unresolved – Skip unresolved links when normalizing the tree. Defaults to False. Because unresolved links are not saved, this argument can be used to normalize and save only newly-added objects.

normalize_hrefs(root_href: str, strategy: HrefLayoutStrategy | None = None, skip_unresolved: bool = False) None[source]

Normalize HREFs will regenerate all link HREFs based on an absolute root_href and the canonical catalog layout as specified in the STAC specification’s best practices.

This method mutates the entire catalog tree, unless skip_unresolved is True, in which case only resolved links are modified. This is useful in the case when you have loaded a large catalog and you’ve added a few items/children, and you only want to update those newly-added objects, not the whole tree.

Parameters:
  • root_href – The absolute HREF that all links will be normalized against.

  • strategy – The layout strategy to use in setting the HREFS for this catalog. If not provided, defaults to the layout strategy of the parent or root and falls back to BestPracticesLayoutStrategy

  • skip_unresolved – Skip unresolved links when normalizing the tree. Defaults to False.

See:

STAC best practices document for the canonical layout of a STAC.

remove_child(child_id: str) None[source]

Removes an child from this catalog.

Parameters:

child_id – The ID of the child to remove.

remove_item(item_id: str) None[source]

Removes an item from this catalog.

Parameters:

item_id – The ID of the item to remove.

save(catalog_type: CatalogType | None = None, dest_href: str | None = None, stac_io: pystac.StacIO | None = None) None[source]

Save this catalog and all it’s children/item to files determined by the object’s self link HREF or a specified path.

Parameters:
  • catalog_type – The catalog type that dictates the structure of the catalog to save. Use a member of CatalogType. If not supplied, the catalog_type of this catalog will be used. If that attribute is not set, an exception will be raised.

  • dest_href – The location where the catalog is to be saved. If not supplied, the catalog’s self link HREF is used to determine the location of the catalog file and children’s files.

  • stac_io – Optional instance of StacIO to use. If not provided, will use the instance set while reading in the catalog, or the default instance if this is not available.

Note

If the catalog type is CatalogType.ABSOLUTE_PUBLISHED, all self links will be included, and hierarchical links be absolute URLs. If the catalog type is CatalogType.RELATIVE_PUBLISHED, this catalog’s self link will be included, but no child catalog will have self links, and hierarchical links will be relative URLs If the catalog type is CatalogType.SELF_CONTAINED, no self links will be included and hierarchical links will be relative URLs.

set_root(root: Catalog | None) None[source]

Sets the root Catalog or Collection for this object.

Parameters:

root – The root object to set. Passing in None will clear the root.

stac_extensions: list[str]

List of extensions the Catalog implements.

title: str | None

Optional short descriptive one-line title for the catalog.

to_dict(include_self_link: bool = True, transform_hrefs: bool = True) dict[str, Any][source]

Returns this object as a dictionary.

Parameters:
  • include_self_link – If True, the dict will contain a self link to this object. If False, the self link will be omitted.

  • transform_hrefs – If True, transform the HREF of hierarchical links based on the type of catalog this object belongs to (if any). I.e. if this object belongs to a root catalog that is RELATIVE_PUBLISHED or SELF_CONTAINED, hierarchical link HREFs will be transformed to be relative to the catalog root.

  • dict – A serialization of the object.

validate_all(max_items: int | None = None, recursive: bool = True) int[source]

Validates each catalog, collection, item contained within this catalog.

Walks through the children and items of the catalog and validates each stac object.

Parameters:
  • max_items – The maximum number of STAC items to validate. Default is None which means, validate them all.

  • recursive – Whether to validate catalog, collections, and items contained within child objects.

Returns:

Number of STAC items validated.

Return type:

int

Raises:

STACValidationError – Raises this error on any item that is invalid. Will raise on the first invalid stac object encountered.

walk() Iterable[tuple[Catalog, Iterable[Catalog], Iterable[Item]]][source]

Walks through children and items of catalogs.

For each catalog in the STAC’s tree rooted at this catalog (including this catalog itself), it yields a 3-tuple (root, subcatalogs, items). The root in that 3-tuple refers to the current catalog being walked, the subcatalogs are any catalogs or collections for which the root is a parent, and items represents any items that have the root as a parent.

This has similar functionality to Python’s os.walk().

Returns:

A generator that yields a 3-tuple (parent_catalog, children, items).

Return type:

Generator[(Catalog, Generator[Catalog], Generator[Item])]

class pystac.catalog.CatalogType(value)[source]

An enumeration.

ABSOLUTE_PUBLISHED = 'ABSOLUTE_PUBLISHED'

Absolute Published Catalog is a catalog that uses absolute links for everything, both in the links objects and in the asset hrefs.

See:

The best practices documentation on published catalogs

RELATIVE_PUBLISHED = 'RELATIVE_PUBLISHED'

Relative Published Catalog is a catalog that uses relative links for everything, but includes an absolute self link at the root catalog, to identify its online location.

See:

The best practices documentation on published catalogs

SELF_CONTAINED = 'SELF_CONTAINED'

A ‘self-contained catalog’ is one that is designed for portability. Users may want to download an online catalog from and be able to use it on their local computer, so all links need to be relative.

See:

The best practices documentation on self-contained catalogs

classmethod determine_type(stac_json: dict[str, Any]) CatalogType | None[source]

Determines the catalog type based on a STAC JSON dict.

Only applies to Catalogs or Collections

Parameters:

stac_json – The STAC JSON dict to determine the catalog type

Returns:

The catalog type of the catalog or collection. Will return None if it cannot be determined.

Return type:

Optional[CatalogType]