eoio.readers.xml module#

eoio.readers.xml#

Generic XML reader functionality.

This module provides a reusable XMLReader base class for extracting metadata from XML files using namespace-aware XPath expressions. It supports lightweight type inference, nested text extraction, and construction of mappings from repeated XML elements.

The class is intended to be subclassed by product- or mission-specific readers (e.g. Sentinel-2, Landsat), which define concrete metadata paths and add higher-level semantic accessors.

class eoio.readers.xml.XMLReader(path: Path)[source]#

Bases: object

Generic XML metadata reader with namespace support and heuristic type casting.

This class provides low-level utilities for reading structured values from XML documents. It performs namespace-aware XPath lookups using metadata_paths and applies simple heuristics to convert XML text content into appropriate Python scalar or sequence types.

Subclasses are expected to define metadata_paths and implement higher-level domain-specific accessors.

static extract_root_namespaces(xml_path: str | Path) dict[str, str][source]#

Extract namespace declarations from the root element of an XML file.

Only namespaces declared directly on the root element are captured.

Parameters:

xml_path – Path to the XML file.

Returns:

Mapping from namespace prefix to namespace URI. The default namespace (if present) is stored under "".

find_mapping(name: str, key: str, *, key_attr: str, key_cast: ~typing.Callable[[str], ~typing.Any] = <class 'str'>, value_xpath: str | None = None, value_cast: ~typing.Callable[[str], ~typing.Any] | None = None, default: ~typing.Any = <object object>, deep_text: bool = False) dict[Any, Any] | None[source]#

Build a dictionary from a parent element containing repeated child elements.

This method supports tri-state behaviour:

  • Returns None if the parent element does not exist.

  • Returns {} if the parent exists but no child entries are found.

  • Returns a populated dictionary if child entries exist.

Namespace robustness:
  • key is treated as a local element name (prefix-agnostic).

  • value_xpath (if provided) is treated as a simple slash-separated local-name path (e.g. "Noise_Model/ALPHA"). Do not include namespace prefixes or a leading ./.

Parameters:
  • name – Metadata field name mapped to a container XPath in self.metadata_paths. The XPath should select the container element (e.g. .../Radiometric_Offset_List).

  • key – Local tag name of the repeated entry elements beneath the container (e.g. RADIO_ADD_OFFSET).

  • key_attr – XML attribute name to use as the dictionary key (e.g. bandId).

  • key_cast – Function to cast the attribute value into the desired key type.

  • value_xpath – Optional local-name path (relative to each entry element) selecting the element whose text should be used as the dictionary value. If omitted, uses the entry element’s own text.

  • value_cast – Function to cast value text. If None, uses _cast_scalar().

  • default – Value to return if the parent element does not exist. If not provided, the method returns None in that case.

  • deep_text – If True, uses descendant text for the selected value element.

Returns:

None if the container is missing (unless default is provided), otherwise a dictionary (possibly empty).

Raises:
  • KeyError – If name is not present in self.metadata_paths.

  • ValueError – If required attributes or value elements are missing, or if values are empty.

find_value(name: str, *, default: Any = None, deep_text: bool = False, as_array: bool | None = None, split: str | None = 'auto') Any[source]#

Find and return a metadata value from the XML document.

The value is located using the XPath associated with name in metadata_paths and converted to an appropriate Python type using heuristic rules.

Parameters:
  • name – Metadata key used to look up an XPath in metadata_paths.

  • default – Value to return if the element is missing or empty.

  • deep_text – If True, all descendant text nodes are used instead of only the element’s direct text.

  • as_array – Controls scalar vs sequence return: None = auto-detect, True = force list, False = force scalar.

  • split

    How to split the raw text into tokens:
    • ”auto” (default): comma or whitespace (existing behaviour)

    • None: do not split; treat the entire raw text as a single token

    • any other string: split on that delimiter (e.g. “,” or “ “)

Returns:

Parsed metadata value.

Raises:

KeyError – If name is not defined in metadata_paths.

metadata_paths: dict[str, str] = {}#

Mapping from metadata keys to XPath expressions. Intended to be overridden by subclasses.