eoio.readers.xml module#
eoio.readers.xml#
Generic XML reader functionality.
This module provides a reusable XMLReader base class for extracting
metadata from XML files using namespace-aware XPath expressions. It supports
lightweight type inference, nested text extraction, and construction of
mappings from repeated XML elements.
The class is intended to be subclassed by product- or mission-specific readers (e.g. Sentinel-2, Landsat), which define concrete metadata paths and add higher-level semantic accessors.
- class eoio.readers.xml.XMLReader(path: Path)[source]#
Bases:
objectGeneric XML metadata reader with namespace support and heuristic type casting.
This class provides low-level utilities for reading structured values from XML documents. It performs namespace-aware XPath lookups using
metadata_pathsand applies simple heuristics to convert XML text content into appropriate Python scalar or sequence types.Subclasses are expected to define
metadata_pathsand implement higher-level domain-specific accessors.- static extract_root_namespaces(xml_path: str | Path) dict[str, str][source]#
Extract namespace declarations from the root element of an XML file.
Only namespaces declared directly on the root element are captured.
- Parameters:
xml_path – Path to the XML file.
- Returns:
Mapping from namespace prefix to namespace URI. The default namespace (if present) is stored under
"".
- find_mapping(name: str, key: str, *, key_attr: str, key_cast: ~typing.Callable[[str], ~typing.Any] = <class 'str'>, value_xpath: str | None = None, value_cast: ~typing.Callable[[str], ~typing.Any] | None = None, default: ~typing.Any = <object object>, deep_text: bool = False) dict[Any, Any] | None[source]#
Build a dictionary from a parent element containing repeated child elements.
This method supports tri-state behaviour:
Returns
Noneif the parent element does not exist.Returns
{}if the parent exists but no child entries are found.Returns a populated dictionary if child entries exist.
- Namespace robustness:
keyis treated as a local element name (prefix-agnostic).value_xpath(if provided) is treated as a simple slash-separated local-name path (e.g."Noise_Model/ALPHA"). Do not include namespace prefixes or a leading./.
- Parameters:
name – Metadata field name mapped to a container XPath in
self.metadata_paths. The XPath should select the container element (e.g..../Radiometric_Offset_List).key – Local tag name of the repeated entry elements beneath the container (e.g.
RADIO_ADD_OFFSET).key_attr – XML attribute name to use as the dictionary key (e.g.
bandId).key_cast – Function to cast the attribute value into the desired key type.
value_xpath – Optional local-name path (relative to each entry element) selecting the element whose text should be used as the dictionary value. If omitted, uses the entry element’s own text.
value_cast – Function to cast value text. If None, uses
_cast_scalar().default – Value to return if the parent element does not exist. If not provided, the method returns
Nonein that case.deep_text – If True, uses descendant text for the selected value element.
- Returns:
Noneif the container is missing (unlessdefaultis provided), otherwise a dictionary (possibly empty).- Raises:
KeyError – If
nameis not present inself.metadata_paths.ValueError – If required attributes or value elements are missing, or if values are empty.
- find_value(name: str, *, default: Any = None, deep_text: bool = False, as_array: bool | None = None, split: str | None = 'auto') Any[source]#
Find and return a metadata value from the XML document.
The value is located using the XPath associated with
nameinmetadata_pathsand converted to an appropriate Python type using heuristic rules.- Parameters:
name – Metadata key used to look up an XPath in
metadata_paths.default – Value to return if the element is missing or empty.
deep_text – If
True, all descendant text nodes are used instead of only the element’s direct text.as_array – Controls scalar vs sequence return:
None= auto-detect,True= force list,False= force scalar.split –
- How to split the raw text into tokens:
”auto” (default): comma or whitespace (existing behaviour)
None: do not split; treat the entire raw text as a single token
any other string: split on that delimiter (e.g. “,” or “ “)
- Returns:
Parsed metadata value.
- Raises:
KeyError – If
nameis not defined inmetadata_paths.
- metadata_paths: dict[str, str] = {}#
Mapping from metadata keys to XPath expressions. Intended to be overridden by subclasses.