Building a Processor Object#
The core of processor_tools functionality involves building Processor classes that are subclasses of BaseProcessor. This section of the user guide provides information for how to build and use your own Processor classes.
Creating a Processor Class#
Processor classes are defined by subclassing the BaseProcessor class. The processor’s processing algorithm should be defined by overriding BaseProcessor.run method.
In this example we define a processor class for multiplying input values together.
In [1]: class Multiplication(processor_tools.BaseProcessor):
...: def run(self, val1, val2):
...: return val1 * val2
...:
In [2]: mult_proc = Multiplication()
In [3]: print(mult_proc.run(2,3))
6
Defining Configuration Values#
Configuration values can be provided when the processor class is initialised with a context object. The context object should be a container with the necessary configuration values defined - this may be as a simple dict.
processor_tools also provides the Context object, which has useful extra functionality for handling Processor state/configuration. For more information on using the Context object for storing processor state, see the relevant section of the user guide.
Within an initialised processor object, the context object can be accessed as an instance attribute.
In [4]: class Exponentiate(processor_tools.BaseProcessor):
...: def run(self, val1):
...: return val1 ** self.context["exponent"]
...:
In [5]: context = {"exponent": 2}
In [6]: exp_proc = Exponentiate(context=context)
In [7]: print(exp_proc)
<Processor: Exponentiate>
In [8]: print(exp_proc.run(3))
9
Setting Processor Names#
By default the processor name is defined as the class name (e.g., "Multiplication" and "Exponentiate"). This may be updated by setting the cls_processor_name class attribute when defining the class. The processor name may be accessed from processor objects via the processor_name attribute.
In [9]: class FirstProcessor(processor_tools.BaseProcessor):
...: pass
...:
In [10]: class SecondProcessor(processor_tools.BaseProcessor):
....: cls_processor_name = "my_new_name"
....:
In [11]: proc1 = FirstProcessor()
In [12]: proc2 = SecondProcessor()
In [13]: print(proc1.processor_name, proc2.processor_name)
FirstProcessor my_new_name
Using Subprocessors#
Processor classes may be related to other processor classes as “subprocessors”. Subprocessors are effectively plugins that can represent modular parts of a processing chain. A defined subprocessor may be switched out for alternative implementation by replacing it with a different processor class. This allows for user configurable processing chains in cases where a variety of processing options are available.
Appending subprocessors to a processor object#
A processor may be added to another processor class’s subprocessors using the append_subprocessor method. Subprocessors may be added as instantiated processor objects, processor classes, or processor factories (more of which below). append_subprocessor stores an instantiated processor object for any of these options.
A processor object’s subprocessors are stored in a dictionary that is accessible via the subprocessors attribute.
In [14]: class MyProcessor(processor_tools.BaseProcessor):
....: pass
....:
In [15]: proc = MyProcessor()
In [16]: proc.append_subprocessor("subprocessor1", MyProcessor)
In [17]: print(proc.subprocessors)
{'subprocessor1': <Processor: MyProcessor>}
This can continue recursively, where a processor class’s subprocessor may itself have it’s own subprocessors and so on.
In [18]: subproc2 = MyProcessor()
In [19]: subproc2.append_subprocessor("subprocessor2a", MyProcessor)
In [20]: proc.append_subprocessor("subprocessor2", subproc2)
In [21]: print(proc.subprocessors)
{'subprocessor1': <Processor: MyProcessor>, 'subprocessor2': <Processor: MyProcessor>}
In [22]: print(proc.subprocessors["subprocessor2"].subprocessors)
{'subprocessor2a': <Processor: MyProcessor>}
Subprocessor paths#
The relative paths for these processors within the subprocessor structure are accessible via the processor_path attribute.
In [23]: print(proc.subprocessors["subprocessor2"].processor_path)
subprocessor2
In [24]: print(proc.subprocessors["subprocessor2"].subprocessors["subprocessor2a"].processor_path)
subprocessor2.subprocessor2a
Configuring subprocessor options with processor factories#
In many cases subprocessor elements within a processing chain may be completed by several different processor implementations, the choice of which may depend on the circumstance.
processor_tools processors handle this with a ProcessorFactory. Processor factories are effectively containers which can store a set of processors.
In [25]: class Algo1(processor_tools.BaseProcessor):
....: cls_processor_name = "algorithm1"
....:
In [26]: class Algo2(processor_tools.BaseProcessor):
....: cls_processor_name = "algorithm2"
....:
In [27]: algo_factory = processor_tools.ProcessorFactory()
In [28]: algo_factory.add_processor(Algo1)
In [29]: algo_factory.add_processor(Algo2)
In [30]: print(algo_factory["algorithm2"])
<class '__main__.Algo2'>
These factories can be used to define the set of optional implementations for a subprocessor. As before, they can be appended to a processor’s subprocessors using the the append_subprocessor method.
The choice of processor implementation is set by the user in the processor context. The context object should define a top level entry called "processor", the value for which defines a set of entries - one for each subprocessor.
Each subprocessor entry is named by the subprocessor path for the factory, with a value of the processor name of choice.
In [31]: context = {"processor": {"opt_algo": "algorithm1"}}
In [32]: proc_with_opts = MyProcessor(context=context)
In [33]: proc_with_opts.append_subprocessor("opt_algo", algo_factory)
In [34]: print(proc_with_opts.subprocessors)
{'opt_algo': <Processor: algorithm1>}
If all the processors required for a factory are in one or more package modules, you can point to that module(s) when building the class.
mod_algo_factory = processor_tools.ProcessorFactory("package.subpackage.module")
So mod_algo_factory would now contain all BaseProcessor subclasses in the module package.subpackage.module.
Defining processor class default subprocessors#
In defining a processor class, it is usually clear what subprocessor steps and options are required. To simplify the definition of such processors, the class subprocessors can be defined as a class attribute at the definition of the class.
In [35]: class ProcessingChain(processor_tools.BaseProcessor):
....: cls_subprocessors = {"sub1": MyProcessor, "sub2": algo_factory}
....:
In [36]: context = {"processor": {"sub2": "algorithm2"}}
In [37]: proc_cls_sps = ProcessingChain(context=context)
In [38]: print(proc_cls_sps.subprocessors)
{'sub1': <Processor: MyProcessor>, 'sub2': <Processor: algorithm2>}
Running processors with subprocessors#
A processor with defined subprocessors can make use them when defining it’s run() method.
In [39]: class Exponentiate(processor_tools.BaseProcessor):
....: def run(self, val1):
....: return val1 ** self.context["exponent"]
....:
In [40]: class Pythagoras(processor_tools.BaseProcessor):
....: cls_subprocessors = {"exp": Exponentiate}
....: def run(self, a, b):
....: exp_proc = self.subprocessors["exp"]
....: return (exp_proc.run(a) + exp_proc.run(b))**0.5
....:
In [41]: context = {"exponent": 2}
In [42]: pyth_proc = Pythagoras(context=context)
In [43]: print(pyth_proc.run(3,4))
5.0
By default however, for a processor with defined subprocessors, it’s run() will run each of the subprocessors sequentially in order, the output of each feeding into the next. This may be of use when the subprocessors define each step of a processing chain.
In [44]: import numpy as np
In [45]: class Square(processor_tools.BaseProcessor):
....: def run(self, val):
....: return val ** 2
....:
In [46]: class Ave(processor_tools.BaseProcessor):
....: def run(self, val):
....: return np.mean(val)
....:
In [47]: class Sqrt(processor_tools.BaseProcessor):
....: def run(self, val):
....: return val ** 0.5
....:
In [48]: class RMS(processor_tools.BaseProcessor):
....: cls_subprocessors = {"sq": Square, "mean": Ave, "root": Sqrt}
....:
In [49]: rms_proc = RMS()
In [50]: print(rms_proc.run(np.array([4,3,2,5,6])))
4.242640687119285