Building collocated datasets#
Once you have a Matchup, you can download the
underlying products and read them into a collocated dataset with a single
call:
from eomatch import EOMatchContext
from eomatch.finder.sat2sat import Sat2SatMUFinder
ctx = EOMatchContext("my_config.yaml")
events = Sat2SatMUFinder(context=ctx).finder()
mu = events[0].matchup_set[0]
ds = mu.return_matchup_dataset()
print(ds)
DataTree('None', parent=None)
├── DataTree('sensor_1')
│ └── ... (variables for product 1)
└── DataTree('sensor_2')
└── ... (variables for product 2)
The returned object is an xarray.DataTree with one node per
sensor (sensor_1, sensor_2, …). Each node contains the data read by
eoio for that product, clipped to the collocation region.
Products are downloaded automatically if they are not already present on disk.
The download destination and API credentials are read from the
EOMatchContext.
Controlling what is read#
return_matchup_dataset() accepts an optional
collection_read_args argument that overrides what eoio reads on a
per-collection basis. Because each collection uses different variable names
(e.g. B02 in Sentinel-2, B2 in Landsat), overrides are always keyed
by STAC collection ID.
Per-collection defaults can be set in the config under the read key
(see Configuring read defaults), so you rarely need to pass collection_read_args
explicitly. When you do, each collection entry may contain any of:
Key |
Behaviour |
Notes |
|---|---|---|
|
Full replacement |
Replaces the config-resolved value entirely for that collection. Use this to control exactly which variables are loaded into memory. |
|
Sub-key merge |
Merged on top of the config-resolved value. Only the keys you supply are overridden; others keep their config (or default) values. |
|
Full replacement |
Replaces the config-resolved value entirely for that collection. Only the processors you name here will run. |
Select specific bands per collection:
dt = mu.return_matchup_dataset(
collection_read_args={
"S2_MSI_L1C": {"vars_sel": {"meas": ["B02", "B03", "B04", "B08"]}},
"LANDSAT_C2L1": {"vars_sel": {"meas": ["B2", "B3", "B4", "B5" ]}},
}
)
Apply processors to one collection only:
dt = mu.return_matchup_dataset(
collection_read_args={
"S2_MSI_L1C": {"processors": {"toa_reflectance": {}}},
}
)
Nudge a read parameter for one collection — read_params within a
collection entry merges at the sub-key level:
dt = mu.return_matchup_dataset(
collection_read_args={
"LANDSAT_C2L1": {"read_params": {"use_chunks": True}},
}
)
Configuring read defaults#
The read section of your config file sets per-collection defaults so you
do not have to repeat them at every call site. Global defaults apply to all
collections; per-collection entries are merged on top.
read:
defaults:
vars_sel:
meas: [] # empty list = read all available measurement variables
aux: []
read_params:
use_chunks: false
metadata_level: true
save_extracted: false
processors: {}
collections:
LANDSAT_C2L1:
vars_sel:
meas: [B2, B3, B4, B5]
S2_MSI_L1C:
vars_sel:
meas: [B02, B03, B04, B8A]
The merge order (lowest → highest priority) is:
Hardcoded fallbacks (
meas: [],aux: [], etc.)read.defaultsfrom the configread.collections.<collection_id>from the configCall-time arguments passed to
return_matchup_dataset()
Building datasets in bulk#
To build datasets for all matchups found in a run, iterate over the events:
for event in events:
for mu in event.matchup_set:
ds = mu.return_matchup_dataset()
# process or save ds ...