Reading in Satellite Data
+++++++++++++++++++++++++

Using the read function
=============================

**eoio** reads in satellite data from a satellite product into an :py:class:`xarray.Dataset` via the use of the ``read()`` function which is also capable of applying processors to these datasets.

The read function can be used as in the following::

    satellite_ds = read(path, vars_sel: Optional[dict] = None, subset: Optional[dict] = None, read_params: Optional[dict] = None, processors: Optional[dict] = None)

where all inputs (apart from product path) are optionally provided.

Input Parameters
-----------------
Product Path
^^^^^^^^^^^^^
The read function takes in the satellite product filepath as the input to be read. The format of the filepath differs for each satellite product and is listed in :ref:`product_paths`. An example Sentinel-2 product path looks something like::

    path = T:\ECO\EOServer\data\satellite\S2A_MSI\L1\PICS\Libya4\S2A_MSIL1C_20151223T091412_N0201_R050_T34RGS_20151223T092043.SAFE

Subsetting and Reading Parameters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Various options can be specified in order to control the exact usage of the read function.
The read function uses separate parameters for variable selection, subsetting, and reading options:

* ``vars_sel`` - a dictionary specifying which variables to read, with keys like ``"meas"`` (measurement variables) and ``"aux"`` (auxiliary variables)
* ``subset`` - a dictionary containing spatial subsetting parameters such as region of interest (ROI) and its coordinate reference system
* ``read_params`` - a dictionary containing reading parameters such as chunking options


For a Sentinel-2 L1C product a ``vars_sel`` and ``subset`` dictionary may look like::

    vars_sel = {
        "meas": ["B02", "B03", "B04"],
        "aux": ["viewing_zenith_angle_B03"],
    }

    subset = {
        "roi": geom,  # shapely geometry, GeoJSON dict, or polygon coordinates
        "roi_crs": "EPSG:4326",
    }

All available variable names and supported combinations for a given product can be inspected via the documentation api or using the ``eoio.product_options`` function::

    from eoio.interface import product_options

    product_path = "path/to/product"
    options = product_options(product_path)
    print(options["vars_sel"])

A more complete page with all available parameters and example usage of the options for the read function will be added in the future.

Processors (and their parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A number of processors are also available which can be run to add additional information to the returned dataset. 
Processors to apply to your dataset can be selected and passed through the read function using the ``processors`` input:

* ``processors`` - a dictionary with the names of the post processors to apply to the dataset along with any required inputs for those processors.

The format of these processing parameters looks something like this::
    
    processors = {
        "add_lat_lon": {},
        "interpolate": {
            "coords": ["x_5000m", "y_5000m"],
            "target_grid": ["x_10m", "y_10m"],
            "method": "linear",
            "data_vars": ['solar_zenith_angle'],
            "inplace": False,
            },
        "s2_rut": {
            "data_vars": ["B01", "B02", "B03"],
            "group_unc": True,  # set to True to compute group uncertainties (systematic and random)
            },
        }
    }

Here the first processor adds latitude and longitude coordinates to the dataset; the second processor interpolates the solar zenith angle from a 5000m grid to a 10m grid; and the third processor applies the Sentinel-2 Radiometric Uncertainty Tool (RUT) to compute systematic and random uncertainties for the selected bands (this does need the sza to be interpoalted to 10m).
All available processors and their inputs can be discovered through the documentation api or using the ``eoio.product_processors`` function::

    from eoio.interface import product_processors

    product_path = "path/to/product"
    processors = product_processors(product_path)
    print(processors)

A more complete page with all available parameters and example usage of the procesors will be added in the future.

Simple Example for Sentinel-2 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is a simple example of reading a Sentinel-2 L2A product (including all bands and no further processing)::

    SAFE_PATH = "path/to/S2A_MSIL2A_...SAFE"

    # Read product
    ds = read(
        SAFE_PATH,
    )

Complete Example for Sentinel-2 with RUT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is a complete example of reading a Sentinel-2 L1C product with subsetting, variable selection, and processors (the processing is significantly faster for smaller ROI)::

    from shapely import wkt
    from eoio import read

    SAFE_PATH = "path/to/S2A_MSIL1C_...SAFE"

    # Define region of interest
    roi_string = "POLYGON ((lon1 lat1, lon2 lat2, lon3 lat3, ...))"
    roi = wkt.loads(roi_string)

    # Read with all parameters
    ds = read(
        SAFE_PATH,
        vars_sel={
            "meas": ["B02", "B03", "B04"],
            "aux": ["solar_zenith_angle"],
        },
        subset={"roi": roi, "roi_crs": "EPSG:4326"},
        read_params={"use_chunks": True},
        processors={"add_lat_lon": {},
                    "interpolate": {
                        "coords": ["y_5000m", "x_5000m"],
                        "data_vars": ['solar_zenith_angle'],
                        "target_grid": ["y_10m", "x_10m"],
                        "method": "linear",
                        "inplace": False,
                    },
                    "s2_rut": {
                        "data_vars": ["B01", "B02", "B03"],
                        "group_unc": True,  # set to True to compute group uncertainties (systematic and random)
                    },
        }
    )

There are also example scripts available in the ``examples`` directory of the repository which illustrate the use of the read function for different satellite products.

Output Dataset
--------------
Example Sentinel-2 output dataset (with selecting bands, processors, uncertainties, ... - see the example above)::

    <xarray.Dataset> Size: 594MB
    Dimensions:                 (x_60m: 583, y_60m: 579, x_10m: 3502, y_10m: 3475,
                                x_5000m: 23, y_5000m: 23, y_5000m_interp_0: 23,
                                y_5000m_interp_1: 23, x_5000m_interp_0: 23,
                                x_5000m_interp_1: 23)
    Coordinates:
    * x_60m                   (x_60m) float64 5kB 6.375e+05 ... 6.724e+05
    * y_60m                   (y_60m) float64 5kB 5.962e+06 ... 5.927e+06
    * x_10m                   (x_10m) float64 28kB 6.374e+05 ... 6.724e+05
    * y_10m                   (y_10m) float64 28kB 5.962e+06 ... 5.927e+06
    * x_5000m                 (x_5000m) int64 184B 600000 605000 ... 705000 710000
    * y_5000m                 (y_5000m) int64 184B 6000000 5995000 ... 5890000
    * y_5000m_interp_0        (y_5000m_interp_0) int64 184B 6000000 ... 5890000
    * y_5000m_interp_1        (y_5000m_interp_1) int64 184B 6000000 ... 5890000
    * x_5000m_interp_0        (x_5000m_interp_0) int64 184B 600000 ... 710000
    * x_5000m_interp_1        (x_5000m_interp_1) int64 184B 600000 ... 710000
        spatial_ref             int64 8B 0
    Data variables:
        B01                     (y_60m, x_60m) float32 1MB dask.array<chunksize=(579, 583), meta=np.ndarray>
        B02                     (y_10m, x_10m) float32 49MB dask.array<chunksize=(1674, 1747), meta=np.ndarray>
        B03                     (y_10m, x_10m) float32 49MB dask.array<chunksize=(1674, 1747), meta=np.ndarray>
        solar_zenith_angle      (y_5000m, x_5000m) float64 4kB 75.96 75.95 ... 74.81
        solar_zenith_angle_60m  (y_60m, x_60m) float64 3MB 75.56 75.56 ... 75.2 75.2
        solar_zenith_angle_10m  (y_10m, x_10m) float64 97MB 75.56 75.56 ... 75.2
        u_systematic_B01        (y_60m, x_60m) float64 3MB 0.003413 0.003421 ... nan
        u_random_B01            (y_60m, x_60m) float64 3MB 0.000946 ... nan
        u_systematic_B02        (y_10m, x_10m) float64 97MB 0.002606 ... nan
        u_random_B02            (y_10m, x_10m) float64 97MB 0.00181 0.001425 ... nan
        u_systematic_B03        (y_10m, x_10m) float64 97MB 0.001929 ... nan
        u_random_B03            (y_10m, x_10m) float64 97MB 0.001194 0.00131 ... nan
    Attributes: (12/28)
        title:                     S2MSI1C
        summary:                   TBC
        institution:               TBC
        source:                    TBC
        references:                TBD
        date_created:              2026-04-11T14:47:26.164923+00:00
        ...                        ...
        eoio:version:              v0.1+625.g85cdac4.dirty
        eoio:path:                 T:\ECO\EOServer\data\unittest_datasets\S2MSIL1...
        history:                   2026-04-11T14:47:26.164923+00:00: S2A_MSIL1C_2...
        eoio:reader:               sentinel2
        eoio:subset:               (637430.357340333, 5927091.05886575, 672449.28...
        eoio:processing_steps:     [{'processor': 'interpolate', 'interpolated_co...

where band values, solar and viewing angles and auxiliary data are stored as variables; band specific metadata is
stored within its corresponding variables' attributes; general metadata is stored within the datasets attributes; and
masks are stored as obsarray flags.

Accessing Variables
-------------------
Variables and metadata can be accessed in the same way as you would access any xarray variable or attribute.

Selecting a specific band
^^^^^^^^^^^^^^^^^^^^^^^^^
::

    print(ds["B02"])

    <xarray.DataArray 'B02' (y_10m: 3375, x_10m: 2577)>
    array([[nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           ...,
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan],
           [nan, nan, nan, ..., nan, nan, nan]])
    Coordinates:
        band           int32 1
      * x_10m          (x_10m) float64 3.57e+05 3.57e+05 ... 3.828e+05 3.828e+05
      * y_10m           (y_10m) float64 4.573e+06 4.573e+06 ... 4.54e+06 4.54e+06
        spatial_ref    int32 0
        longitude_10m         (y_10m, x_10m) float64 109.3 109.3 109.3 ... 109.6 109.6 109.6
        latitude_10m         (y_10m, x_10m) float64 41.3 41.3 41.3 41.3 ... 41.0 41.0 41.0
        number         int32 0
        time           datetime64[ns] 2022-03-18T12:00:00
        step           timedelta64[ns] 15:36:00
        surface        float64 0.0
        valid_time     datetime64[ns] 2022-03-19T03:36:00
        isobaricInhPa  float64 0.0
    Attributes:
        units:
        standard_name:                        toa_bidirectional_reflectance_B02
        long_name:                            Reflectance in band B02
        ABSOLUTE_CALIBRATION_ACCURACY:        5
        ALPHA:                                0.567
        BETA:                                 0.04696
        CROSS_BAND_CALIBRATION_ACCURACY:      3
        MULTI_TEMPORAL_CALIBRATION_ACCURACY:  1
        PHYSICAL_GAINS:                       FULL_RESOLUTION
        RADIO_ADD_OFFSET:                     -1000
        RESOLUTION:                           10
        SOLAR_IRRADIANCE:                     {'@unit': 'W/m²/µm', '#text': 1959.66}

