Central catalogue#

EOMatch includes a server-side catalogue stack that lets the whole team share a single searchable database of matchup events, matchup items, and analysis results. Researchers can query the central catalogue to pull a subset of items into a local working catalogue, and the pipeline can push new results back in.

Two HTTP endpoints are served:

  • Internal (/api/) — full assets, including file:// paths to NFS products and analysis files. Write access is protected by an API key. Accessible over the VPN only.

  • External (/external/) — file:// assets are stripped before responses leave the server. Read-only. Suitable for sharing with collaborators outside NPL.

See EOMatch API & Database Architecture for the full stack design and Setting Up the Central Catalogue (Phase 1) for server setup instructions.

Querying the central catalogue#

Install the query extra alongside eomatch:

pip install -e '.[query]'

This installs pystac_client, the library used to search the STAC API.

From the command line#

eomatch-query \
    --api-url http://your-server:8000/external/ \
    --output ./my_matchups

Filter by collection, time range, or bounding box:

eomatch-query \
    --api-url http://your-server:8000/external/ \
    --output ./my_matchups \
    --collections LANDSAT_C2L1-Landsat-9-vs-S2_MSI_L1C-S2A \
    --start-time 2022-01-01 \
    --end-time 2022-12-31 \
    --bbox -10 40 30 70

Add -v / --verbose for debug-level logging.

Queries are idempotent — re-running after the central catalogue has been updated adds new items and replaces any that have changed locally.

From Python#

import datetime as dt
from eomatch.query import query

query(
    api_url="http://your-server:8000/external/",
    output_path="./my_matchups",
    collections=["LANDSAT_C2L1-Landsat-9-vs-S2_MSI_L1C-S2A"],
    start_time=dt.datetime(2022, 1, 1),
    end_time=dt.datetime(2022, 12, 31),
)

query() returns the pystac.Catalog that was saved, which you can pass directly to MatchupCatalogue.

Items referenced via related or derived_from links (matchup items and source product items) are fetched automatically even if they are not in the requested collections, so that get_events() works on the result without any further network access.

Full reference:

eomatch.query.query(api_url: str, output_path: str, collections: List[str] | None = None, start_time: datetime | None = None, end_time: datetime | None = None, bbox: List[float] | None = None, filter_expr: str | None = None, filter_lang: str = 'cql2-text', config=None) Catalog[source]#

Pull items from a STAC API and write them to a local pystac catalogue.

Searches the STAC API at api_url using the supplied filters and writes matching items to output_path in the same on-disk layout that MatchupCatalogue produces, so the result can be opened immediately with MatchupCatalogue.open().

Items referenced via related or derived_from links (matchup items and source product items) are fetched automatically even if they are not in the requested collections, so that get_events() works on the result without requiring network access.

If a catalogue already exists at output_path (from a previous run or from find_and_catalogue()), new items are merged in and the catalogue is resaved. Existing items are replaced with the API version (upsert semantics).

Requires the pystac_client package (pip install eomatch[query]).

Example usage:

from eomatch.query import query

query(
    api_url="http://my-server:8000/external/",
    output_path="/data/my_matchups",
    collections=["LANDSAT_C2L1-Landsat-9-vs-S2_MSI_L1C-S2A"],
    start_time=datetime(2022, 1, 1),
    end_time=datetime(2022, 12, 31),
    filter_expr="time_diff_s < 900 AND land_fraction < 0.2",
)
Parameters:
  • api_url – base URL of the STAC API to query (e.g. http://server:8000/external/).

  • output_path – directory to write the local catalogue into.

  • collections – restrict results to these collection IDs. Queries all collections if None.

  • start_time – include only items whose datetime is at or after this value.

  • end_time – include only items whose datetime is at or before this value.

  • bbox – spatial filter as [min_lon, min_lat, max_lon, max_lat].

  • filter_expr – CQL2 filter expression applied server-side, e.g. "time_diff_s < 900 AND land_fraction < 0.2". Requires the STAC API to support the filter extension (pgSTAC does).

  • filter_lang – filter language identifier; "cql2-text" (default) or "cql2-json".

  • config – path to a eomatch YAML config file, or a dict of overrides. Loaded for default settings but all parameters above take precedence.

Returns:

the resulting pystac.Catalog (also saved to disk).

Working with the result#

The output directory is a valid local pystac catalogue in the same format that eomatch-find produces. Open it with MatchupCatalogue:

from eomatch.mu_stac import MatchupCatalogue

cat = MatchupCatalogue.open("./my_matchups/catalog.json")

events = cat.get_events(
    start_time=dt.datetime(2022, 6, 1),
    stop_time=dt.datetime(2022, 6, 30),
)

for event in events:
    for mu in event.matchup_set:
        ds = mu.return_matchup_dataset()   # reads products from NFS
        # analyse...

If you are working outside NPL without access to the NFS, download the source products first:

# Pull items from the external API (no file:// assets)
eomatch-query \
    --api-url http://your-server:8000/external/ \
    --output ./my_matchups

# Download the EO products from the public archives (CEDA, AWS, …)
eomatch-download --path ./my_matchups

Checking catalogue status#

eomatch-status prints a summary of item counts and date ranges for every collection currently in the central catalogue:

eomatch-status --api-url http://your-server:8000/api/

Connection can also be read from your user config:

# ~/.config/eomatch/user_config.yaml
query:
  api_url: http://your-server:8000/api/

Then simply run:

eomatch-status

Pushing to the central catalogue#

After running eomatch-find locally, push the results to the central catalogue with eomatch-ingest:

pip install -e '.[ingest]'

eomatch-ingest \
    --catalogue /data/my_catalogue \
    --db-host your-server \
    --db-user postgres \
    --assets-base-url http://your-server:8000/catalogue

--assets-base-url rewrites relative asset hrefs (such as thumbnail file:// paths) to HTTP URLs served statically by the proxy, so the STAC Browser can load them. Omit it if your items have no local-file assets.

Connection parameters can also be set in your user config so you do not have to pass them every time:

# ~/.config/eomatch/user_config.yaml
ingest:
  db_host: your-server
  db_port: 5432
  db_name: eomatch
  db_user: postgres

Pass the password via the PGPASSWORD environment variable to avoid storing credentials in a config file:

export PGPASSWORD=your-postgres-password
eomatch-ingest --config my_run.yaml

Ingest uses upsert semantics, so re-running after a partial failure or an incremental update is always safe.

See Setting Up the Central Catalogue (Phase 1) for instructions on setting up the server-side stack.