STAC catalogue#
EOMatch persists matchup events and matchup items to a
STAC catalogue on disk. The catalogue is managed
through MatchupCatalogue and follows this
structure:
catalogue/
├── catalog.json
├── {collection-1}/
│ ├── collection.json
│ └── YYYY/MM/DD/{item-id}.json # one per source product
├── {collection-2}/
│ └── …
├── matchup-events-{col-1}-{platform-1}-vs-{col-2}-{platform-2}/
│ ├── collection.json
│ └── YYYY/MM/DD/{item-id}.json # one per MatchupEvent
└── {col-1}-{platform-1}-vs-{col-2}-{platform-2}/
├── collection.json
└── YYYY/MM/DD/{item-id}.json # one per Matchup
Items are organised by date rather than by per-item subdirectory, so a busy
collection stays flat and browsable. Collection IDs include the platform name
for each sensor so that matchups between different satellite platforms within
the same collection are stored separately (e.g. LANDSAT_C2L1-Landsat-8
vs LANDSAT_C2L1-Landsat-9). Pairs are sorted alphabetically by
collection name for stability, giving IDs such as
LANDSAT_C2L1-Landsat-9-vs-S2_MSI_L1C-S2A.
Each matchup Item links back to its source products via derived_from links
and to its parent event via a related link (matchup:role=event).
Running the find-and-catalogue pipeline#
The recommended way to populate the catalogue is via
find_and_catalogue(), which runs the
Sat2SatMUFinder and saves everything in
one step:
from eomatch import EOMatchContext
from eomatch.find_and_catalogue import find_and_catalogue
ctx = EOMatchContext("my_config.yaml")
catalogue = find_and_catalogue(context=ctx, path="/data/my_catalogue")
Or from the command line:
eomatch-find --config my_config.yaml --path /data/my_catalogue
Add --verbose / -v for debug-level logging.
The catalogue path can also be set in the config file:
matchup_catalogue:
path: /data/my_catalogue
id: my-matchup-catalogue
description: "Matchups for S2A vs S3A, June 2023"
Opening an existing catalogue#
from eomatch.mu_stac import MatchupCatalogue
cat = MatchupCatalogue.open("/data/my_catalogue/catalog.json")
Querying the catalogue#
Use get_events() to retrieve
events and their matchups, with optional filtering:
import datetime as dt
events = cat.get_events(
collections=["S2_MSI_L1C", "S3_EFR"],
start_time=dt.datetime(2023, 6, 1),
stop_time=dt.datetime(2023, 6, 30),
bbox=[-10.0, 40.0, 30.0, 70.0],
)
for event in events:
print(event)
for mu in event.matchup_set:
print(" ", mu)
To restrict to events whose source products have already been downloaded:
events = cat.get_events(products_downloaded=True)
Downloading products#
download_products() downloads
all source products for a set of events and registers a "data" asset on
each product Item so that the download state is tracked in the catalogue:
cat.download_products(event_set=events)
Products that are already present on disk are registered without being re-downloaded. The updated Item JSON is written to disk after each product.
Managing products from the command line#
The eomatch-download and eomatch-remove console scripts provide a
convenient way to bulk-download or remove source products without writing any
Python. Both commands accept the same filtering flags:
# Download all products for S2 vs Landsat matchups in June 2023
eomatch-download \
--path /data/my_catalogue \
--collections S2_MSI_L1C,LANDSAT_C2L1 \
--start-time 2023-06-01 \
--stop-time 2023-06-30
# Remove those products from disk (keeps catalogue metadata intact)
eomatch-remove \
--path /data/my_catalogue \
--collections S2_MSI_L1C,LANDSAT_C2L1 \
--start-time 2023-06-01 \
--stop-time 2023-06-30
# Remove asset references only, leave the files on disk
eomatch-remove --path /data/my_catalogue --keep-files
Pass --verbose / -v for debug-level logging. The catalogue path can be
omitted if matchup_catalogue.path is set in your config file; pass
--config to load a non-default config.
Available filter flags (shared by both commands):
Flag |
Description |
|---|---|
|
Catalogue root directory or |
|
Comma-separated collection names (e.g. |
|
Comma-separated platform names. |
|
ISO 8601 start-time; events ending before this are excluded. |
|
ISO 8601 stop-time; events starting after this are excluded. |
|
Spatial bounding-box filter. |
eomatch-remove additionally accepts --keep-files to remove the
"data" asset from the catalogue without deleting the local files.
Managing products from Python#
The same functionality is available as Python functions in
eomatch.manage_products:
import datetime as dt
from eomatch import EOMatchContext
from eomatch.manage_products import (
download_catalogue_products,
remove_catalogue_products,
)
ctx = EOMatchContext("my_config.yaml")
# Download source products for matching events
paths = download_catalogue_products(
context=ctx,
path="/data/my_catalogue",
collections=["S2_MSI_L1C", "LANDSAT_C2L1"],
start_time=dt.datetime(2023, 6, 1),
stop_time=dt.datetime(2023, 6, 30),
)
print(f"Handled {len(paths)} product(s)")
# Remove downloaded products from disk (and deregister from catalogue)
n = remove_catalogue_products(
context=ctx,
path="/data/my_catalogue",
collections=["S2_MSI_L1C", "LANDSAT_C2L1"],
delete_files=True, # set False to keep files, remove asset reference only
)
print(f"Removed {n} product asset(s)")
Both functions open the catalogue, apply the filters, and then act on every
matching event. Products that appear in multiple matchups are processed only
once. download_catalogue_products()
registers a "data" STAC asset on each product Item after downloading so
that the catalogue tracks which products are present on disk.
Attaching assets#
You can attach arbitrary STAC assets to any Item or Collection in the catalogue — useful for storing processing outputs alongside the raw products:
import pystac
asset = pystac.Asset(href="/data/results/mu_stats.csv", media_type="text/csv")
# Attach to a single matchup Item
cat.add_matchup_asset(mu, asset_key="statistics", asset=asset)
# Attach to a matchup event Item
cat.add_event_asset(event, asset_key="thumbnail", asset=pystac.Asset(...))
# Attach to the matchup Collection (sensor-pair level)
cat.add_matchup_collection_asset(mu, asset_key="report", asset=pystac.Asset(...))
Corresponding remove_* methods are available for all three levels and will
optionally delete the local file from disk.