Readers#
This guide describes how to add a new EO product reader to eoio.
Overview#
Readers in eoio follow a thin orchestrator design. Each reader is responsible for parsing user configuration and coordinating specialised helper modules — it does not contain heavy IO, geometry, or parsing logic itself. That logic lives in small, single-responsibility modules alongside the reader.
The reading pipeline is:
layout → subset resolution → image IO → optional enrichment → conventions
Typical package layout for a reader named myreader:
eoio/readers/myreader/
reader.py # orchestrator – thin, imports helper modules
layout.py # product structure and file discovery
data_io.py # raster/data reading and subsetting
subset.py # optional: ROI or other subset resolution
aux_data.py # auxiliary variable ingestion
conventions.py # standardise variable names, attrs, provenance
metadata/ # product metadata parsing (XML, JSON, etc.)
tests/ # unit tests for each module
Not all modules are required for every reader. Point/in-situ readers, for example,
typically omit layout.py and data_io.py and use a simpler metadata approach.
Note
Place the reader package in eoio/readers/. Place test files in a tests/
sub-package within the reader directory. See existing readers
(e.g. eoio/readers/sentinel2/, eoio/readers/hypernets/) as reference
implementations.
Useful API References#
Reader Factory class |
|
Base class for EOIO readers. |
|
Base Reader class for raster imagery |
|
Base Reader class for In Situ data |
Base Classes#
All readers inherit from one of the base classes in eoio.readers.base. Choose the
most appropriate starting point:
Class |
When to use |
|---|---|
|
Gridded satellite imagery (Sentinel-2, Landsat, OLCI, SLSTR, PlanetScope, …) |
|
Point / time-series data (Hypernets, RadCalNet, …) |
|
Any reader that does not fit the above (e.g. ERA5, generic NetCDF) |
The base classes are intentionally small. They own:
Path validation
Configuration merging and validation (via
ReaderConfig)The single public entrypoint
open()
Everything else is the responsibility of the concrete reader.
Creating a Reader Class#
Required Contract#
A concrete reader must:
Inherit from an appropriate base class.
Override the class-level configuration dictionaries.
Implement
open_dataset().Implement the static method
get_extension().
Attribute / Method |
Purpose |
|---|---|
|
Default values for variable selection ( |
|
Default subsetting parameters (e.g. |
|
Default read-time parameters (e.g. |
|
Mapping of preset names ( |
|
Mapping of preset names to lists of auxiliary variable names |
|
Mapping of preset names to lists of mask variable names |
|
Read the product and return an |
|
Return the file extension string for the reader (e.g. |
Public Entrypoint#
Users call reader.open() — not open_dataset() directly. open() is
provided by the base class and calls open_dataset() internally:
reader = MyReader(path, vars_sel={"meas": "all"}, subset={"roi": [...]})
ds = reader.open()
Configuration is passed via three optional dictionaries:
vars_sel— which measurement, auxiliary, and mask variables to includesubset— spatial, temporal, or spectral subsettingread_params— read-time options such asmetadata_level
Unknown keys raise a ValueError at initialisation time. Validated, merged
configuration is stored in reader.config (a ReaderConfig
dataclass) and resolved configuration in reader.resolved_config.
Minimal Example#
from pathlib import Path
from typing import Any, Dict, List, Optional
import xarray as xr
from eoio.readers.base import BaseRasterReader
class MyReader(BaseRasterReader):
default_read_params = {
"save_extracted": False,
"metadata_level": "all",
"include_uncertainties": False,
}
meas_def: Dict[str, List[str]] = {
"all": ["band1", "band2", "band3"],
"rgb": ["band1", "band2", "band3"],
}
aux_def: Dict[str, List[str]] = {"all": []}
mask_def: Dict[str, List[str]] = {"all": []}
@staticmethod
def get_extension() -> str:
return ".myformat"
def open_dataset(self) -> xr.Dataset:
ds = xr.Dataset()
# ... read data using helper modules ...
return ds
Variable Selection#
Variable selection is controlled by the vars_sel argument. The meas_def,
aux_def, and mask_def class attributes define which names are valid.
Keys in each *_def dict are preset names. The value under "all" must list
every available variable of that type. Additional keys (e.g. "rgb", "basic")
are optional subsets:
meas_def = {
"all": ["B01", "B02", "B03", ..., "B12"],
"rgb": ["B02", "B03", "B04"],
}
Users can then request vars_sel={"meas": "rgb"} or vars_sel={"meas": ["B02", "B08"]}.
None (the default) returns an empty list. The base class resolves these into concrete
lists stored in reader.resolved_config.vars_sel.
Subsetting#
Subsetting parameters depend on the base class:
BaseRasterReader default subset keys:
Key |
Description |
|---|---|
|
Region of interest as bounding-box coordinates or polygon |
|
CRS of the ROI (default |
|
Angle filter (min/max/nearest/tolerance) |
|
Wavelength filter (min/max/nearest/tolerance) |
BaseInSituReader default subset keys:
Key |
Description |
|---|---|
|
Wavelength filter |
|
Datetime filter (min/max/nearest/tolerance_days/hours/minutes) |
|
Angle filter |
Override resolve_subset() to convert the raw subset dict into an internal
representation. For raster readers that accept an ROI, use
ROISubsetResolver:
from eoio.readers.subset.roi_subset import ROISubsetResolver, ResolvedROISubset
def resolve_subset(self, subset):
if not subset or subset.get("roi") is None:
return None
return ROISubsetResolver(
self.layout, subset["roi"], subset.get("roi_crs", 4326)
).resolve()
Helper Modules#
Move complex logic out of the reader class into focused modules. Common patterns:
layout.pyProduct structure and file discovery. Knows how to find band images, metadata XMLs, and auxiliary files given a product path. Should not perform any IO beyond path resolution.
data_io.pyRaster reading and spatial subsetting. Uses lazy imports from
eoio.depsforrasterioandrioxarray.subset.pyReader-specific ROI or temporal subset resolution. May use
eoio.readers.subsetutilities or implement its own logic.aux_data.pyAuxiliary variable ingestion (e.g. meteorological grids, ECMWF/CAMS data).
angles.pyAngle grid parsing (e.g. solar/viewing angles from XML).
conventions.pyApply controlled-vocabulary variable names, attributes, and provenance stamps to the assembled
xarray.Dataset. See Controlled Vocabulary for the expected attribute values.metadata/XML or JSON metadata parsers. Keep each metadata document type in its own module (e.g.
s2_prod_mtd.py,s2_tl_mtd.py).
Lazy Optional Dependencies#
Heavy dependencies are imported lazily to keep eoio lightweight. Use the helpers
in eoio.deps instead of top-level imports:
from eoio.deps import lazy_rasterio, lazy_rioxarray, lazy_pyproj, lazy_shapely
def _read_band(path):
rasterio = lazy_rasterio()
with rasterio.open(path) as src:
...
Available helpers:
Function |
Install extra |
Provides |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Output Dataset Requirements#
The xarray.Dataset returned by open_dataset() must follow the eoio
controlled vocabulary. See Controlled Vocabulary for the full specification.
Key requirements are summarised here.
Required Global Attributes#
Attribute |
Description |
Example |
|---|---|---|
|
CF version string |
|
|
Human-readable dataset title |
|
|
Producing institution |
|
|
Upstream product name |
|
|
Audit trail (append, do not overwrite) |
|
|
Controlled platform token |
|
|
Controlled instrument token |
|
|
Controlled level token |
|
|
Full upstream product identifier |
|
|
Stable collection identifier |
|
Required Variable Attributes#
Each measurement, auxiliary, and mask variable should carry:
Attribute |
Description |
|---|---|
|
CF standard name where available |
|
Human-readable description |
|
UDUNITS-compliant unit string |
Variable-specific metadata should be stored in the DataArray.attrs dict of
the variable, not in global dataset attributes.
Flag Variables (Masks)#
All mask variables must be stored as CF-convention flag variables using obsarray:
ds.flag["quality_flags"] = (["y", "x"], {"flag_meanings": ["cloud", "land"]})
ds.flag["quality_flags"]["cloud"][:, :] = cloud_mask
For products where masks arrive as packed bit fields, assign the raw array and
set flag_meanings and flag_masks attributes directly:
ds["quality_flags"] = (("y", "x"), packed_flags)
ds.quality_flags.attrs = {
"flag_meanings": "cloud land shadow",
"flag_masks": "1,2,4",
}
Dimension Naming#
Follow the dimension naming rules in Controlled Vocabulary. For multi-resolution
raster datasets use the x_<resolution> / y_<resolution> pattern
(e.g. x_10m, y_10m, x_60m, y_60m).
Registering a Reader#
Once your reader class exists, register it in
eoio.readers.factory.ReaderFactory.get_reader. Add a regular expression that
uniquely matches your product path and a lazy import of your reader class:
my_pattern = re.compile(r"MY_SENSOR_.*\Z")
...
elif re.search(my_pattern, path):
from eoio.readers.myreader.reader import MyReader
return MyReader
Lazy imports inside the elif branches are intentional — they avoid importing
heavy optional dependencies at package import time.
The order of patterns matters: place more specific patterns before broad catch-all
patterns (e.g. the generic .nc pattern must come last).