API Reference¶
This is the API Reference of the Python library Contextual Encoders.
Note
When the dependencies got updated, an export from poetry needs to be done to update the requirements.txt within the doc directory: poetry export –dev -f requirements.txt –output requirements.txt.
Aggregator¶
Aggregators are used to combine multiple matrices to a single matrix. This is used to combine similarity or dissimilarity matrices of multiple attributes to a single one. Thus, an aggregator \(\mathcal{A}\) is a mapping of the form \(\mathcal{A} : \mathbb{R}^{n \times n \times m} \rightarrow \mathbb{R}^{n \times n}\), with \(n\) being the amount of features and \(m\) being the number of similarity or dissimilarity matrices of type \(D \in \mathbb{R}^{n \times n}\).
Currently, the following aggregators are implement:
Name |
Formula |
Mean |
\(\mathcal{A} (D^1, D^2, ..., D^m) = \frac{1}{m} \sum_{i=1}^{m} D^i\) |
Median |
\(\mathcal{A} (D^1, D^2, ..., D^m) = \left\{ \begin{array}{ll} D^{\frac{m}{2}} & \mbox{, if } m \mbox{ is even} \\ \frac{1}{2} \left( D^{\frac{m-1}{2}} + D^{\frac{m+1}{2}} \right) & \mbox{, if } m \mbox{ is odd} \end{array} \right.\) |
Max |
\(\mathcal{A} (D^1, D^2, ..., D^m) = max_{ k} \; D_{i,j}^k\) |
Min |
\(\mathcal{A} (D^1, D^2, ..., D^m) = min_{ k} \; D_{i,j}^k\) |
- class contextual_encoders.aggregator.Aggregator¶
Bases:
abc.ABCAn abstract base class for aggregators.
- abstract aggregate(matrices)¶
The abstract method that is implemented by the concrete aggregators.
- class contextual_encoders.aggregator.AggregatorFactory¶
Bases:
objectThe factory class for creating concrete instances of aggregators.
- static create(aggregator)¶
Creates an instance of the given aggregator name.
- Parameters
aggregator – The name of the aggregator, which can be
mean,median,maxormin.- Returns
An instance of the aggregator.
- Raises
ValueError – The given aggregator does not exist.
- class contextual_encoders.aggregator.MaxAggregator¶
Bases:
contextual_encoders.aggregator.AggregatorThis class aggregates similarity or dissimilarity matrices using the
max. Given \(m\) similarity or dissimilarity matrices \(D^i \in \mathbb{R}^{n \times n}\), the MaxAggregator calculates\(\mathcal{A} (D^1, D^2, ..., D^m) = max_{ k} \; D_{i,j}^k\).
- aggregate(matrices)¶
Calculates the max of all given matrices along the zero axis.
- Parameters
matrices – A list of 2D numpy arrays.
- Returns
A 2D numpy array.
- class contextual_encoders.aggregator.MeanAggregator¶
Bases:
contextual_encoders.aggregator.AggregatorThis class aggregates similarity or dissimilarity matrices using the
mean. Given \(m\) similarity or dissimilarity matrices \(D^i \in \mathbb{R}^{n \times n}\), the MeanAggregator calculates\(\mathcal{A} (D^1, D^2, ..., D^m) = \frac{1}{m} \sum_{i=1}^{m} D^i\).
- aggregate(matrices)¶
Calculates the mean of all given matrices along the zero axis.
- Parameters
matrices – A list of 2D numpy arrays.
- Returns
A 2D numpy array.
- class contextual_encoders.aggregator.MedianAggregator¶
Bases:
contextual_encoders.aggregator.AggregatorThis class aggregates similarity or dissimilarity matrices using the
median. Given \(m\) similarity or dissimilarity matrices \(D^i \in \mathbb{R}^{n \times n}\), the MedianAggregator calculates\(\mathcal{A} (D^1, D^2, ..., D^m) = \left\{ \begin{array}{ll} D^{\frac{m}{2}} & \mbox{, if } m \mbox{ is even} \\ \frac{1}{2} \left( D^{\frac{m-1}{2}} + D^{\frac{m+1}{2}} \right) & \mbox{, if } m \mbox{ is odd} \end{array} \right.\)
- aggregate(matrices)¶
Calculates the median of all given matrices along the zero axis.
- Parameters
matrices – A list of 2D numpy arrays.
- Returns
A 2D numpy array.
- class contextual_encoders.aggregator.MinAggregator¶
Bases:
contextual_encoders.aggregator.AggregatorThis class aggregates similarity or dissimilarity matrices using the
min. Given \(m\) similarity or dissimilarity matrices \(D^i \in \mathbb{R}^{n \times n}\), the MinAggregator calculates\(\mathcal{A} (D^1, D^2, ..., D^m) = min_{ k} \; D_{i,j}^k\).
- aggregate(matrices)¶
Calculates the min of all given matrices along the zero axis.
- Parameters
matrices – A list of 2D numpy arrays.
- Returns
A 2D numpy array.
MatrixComputer¶
The MatrixComputer combines the Measure with the Gatherer and calculates the similarity or dissimilarity matrix for one attribute. Thus, the MatrixComputer can be seen as a mapping \(\mathcal{M}: F \rightarrow \mathbb{R}^{n \times n}\), with \(F\) being the feature space and \(n\) the amount of selected features.
- class contextual_encoders.computer.MatrixComputer(measure, gatherer, separator_token=',')¶
Bases:
objectThe service class to compute similarity or dissimilarity matrices.
- __init__(measure, gatherer, separator_token=',')¶
Initializes the MatrixComputer.
- Parameters
measure – The instance of the similarity or dissimilarity measure.
gatherer – The name of the gatherer. If the measure can handle multiple values, the
Identitygatherer will be taken in any way.separator_token – A string for separating categorical variables into multiple values.
- compute(data)¶
Computes the similarity or dissimilarity matrix based on the given data.
- Parameters
data – A single pandas series containing the data. Note, that each entry can have multiple values, that are separated with the
separator_token.- Returns
A 2D numpy array representing the similarity or dissimilarity matrix.
Context¶
The Context is the core part of the Contextual Encoders library. It is used to measure the similarity or dissimilarity of attributes. So far, two different Context-Types are implemented: GraphContext and TreeContext. However, it is very likely that custom context needs to be implemented. Therefore, the base classes Context and GraphBasedContext are used, that come with optimized in- and export functions as well as caching.
- class contextual_encoders.context.Context(name)¶
Bases:
abc.ABCThe abstract base class for all Context.
- __init__(name)¶
Initializes the Context.
- Parameters
name – The name of the Context.
- abstract export_to_file(path)¶
Exports the Context to the given file path.
- Parameters
path – The path to export the Context to.
- abstract import_from_file(path)¶
Imports the Context from the given file path.
- Parameters
path – The path to import the Context from.
- class contextual_encoders.context.GraphBasedContext(name)¶
Bases:
contextual_encoders.context.ContextA base class for all graph based Context.
- __init__(name)¶
Initializes the GraphBasedContext.
- Parameters
name – The name of the Context.
- draw()¶
Draws the graph using matplotlib.
- export_to_file(path)¶
Exports the graph to the given file path.
- Parameters
path – The path to export the graph to.
- get_graph()¶
Returns the networkx DiGraph instance.
- Returns
A networkx DiGraph instance.
- import_from_file(path)¶
Imports the graph from the given file path.
- Parameters
path – The path to import the graph from.
- class contextual_encoders.context.GraphContext(name)¶
Bases:
contextual_encoders.context.GraphBasedContextA graph based Context than can be used for graph based measures.
- add_concept(node, neighbor=None, weight=1.0)¶
Adds a new node to the graph. If the neighbor does not exist, it will be added as new node. If the node already exists, the weight will be overwritten.
- Parameters
node – The name of the node.
neighbor – The name of the neighbor node.
weight – The wight of the edge between the node and the neighbor.
- class contextual_encoders.context.TreeContext(name)¶
Bases:
contextual_encoders.context.GraphBasedContextA graph based Context than can be used for tree based measures.
- add_concept(child, parent=None, weight=1.0)¶
Adds a new node to the tree, where the name of the context serves as the root node. If the parent does not exist, it will be added as new node. If the parent is None, the root node will serve as the parent. If the node already exists, the weight will be overwritten.
- Parameters
child – The name of the child node.
parent – The name of the parent node.
weight – The wight of the edge between the child and the parent.
- get_root()¶
Gets the name of the root, i.e. the name of the context.
- Returns
The name of the root.
- get_tree()¶
Returns the networkx DiGraph instance.
- Returns
A networkx DiGraph representing the tree.
ContextualEncoder¶
The ContextualEncoder is the actual interface for using the Contextual Encoders library. It is used to perform the contextual encoding of a given dataset. Moreover, it inherits from the scikit-learn BaseEstimator and TransformerMixin types and thus enable being used in scikit-learn Pipelines.
- class contextual_encoders.encoder.ContextualEncoder(measures, cols=None, inverter='lin', gatherer='smm', aggregator='mean', reducer='mds', **kwargs)¶
Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixin- __init__(measures, cols=None, inverter='lin', gatherer='smm', aggregator='mean', reducer='mds', **kwargs)¶
Initializes the ContextualEncoder.
- Parameters
measures – A measure.
cols – Pandas columns.
inverter – The inverter
gatherer – The gatherer.
aggregator – The aggregator.
reducer – The reducer.
kwargs – Additional keywords.
- fit_transform(x, y=None, **fit_params)¶
Encodes the given data.
- Parameters
x – The data as numpy array, pandas dataframe or python list format.
y – TBA.
fit_params – TBA.
- Returns
The transformed data.
- get_dissimilarity_matrix()¶
Gets the dissimilarity matrix.
- Returns
The dissimilarity matrix as 2D numpy array.
- get_similarity_matrix()¶
Gets the similarity matrix.
- Returns
The similarity matrix as 2D numpy array.
Gatherer¶
A Gatherer is used to combine a set of pairwise attribute measures to a single measure.
Note
If a measure can handle multiple values, a Gatherer is not needed.
- class contextual_encoders.gatherer.FirstValueGatherer¶
Bases:
contextual_encoders.gatherer.GathererA Gatherer only measuring the first values of the attributes.
- class contextual_encoders.gatherer.Gatherer¶
Bases:
abc.ABCThe abstract base class of all Gatherer.
- __init__()¶
Initializes the Gatherer.
- gather(first, second)¶
Combines the given attributes.
Note
Multiple values, e.g. comma separated, are exclusively possible.
- Parameters
first – The value of the first attribute.
second – The value of the second attribute.
- Returns
The aggregated value.
- set_measure(measure)¶
Sets the measure for the Gatherer.
- Parameters
measure – The measure.
- class contextual_encoders.gatherer.GathererFactory¶
Bases:
objectThe factory class for creating Gatherer.
- static create(gatherer_type)¶
Creates a Gatherer given the type.
- Parameters
gatherer_type – The type of the Gatherer which can be
id,firstorsmm.- Returns
The concrete instance of the Gatherer.
- class contextual_encoders.gatherer.IdentityGatherer¶
Bases:
contextual_encoders.gatherer.GathererA Gatherer that let the measure decide how to handle multiple values.
- class contextual_encoders.gatherer.SymMaxMeanGatherer¶
Bases:
contextual_encoders.gatherer.GathererA symmetrical maximum gatherer implementation.
Inverter¶
A Gatherer is used to combine a set of pairwise attribute measures to a single measure.
Note
If a measure can handle multiple values, a Gatherer is not needed.
- class contextual_encoders.inverter.CosineInverter¶
- class contextual_encoders.inverter.ExponentialInverter¶
- class contextual_encoders.inverter.LinearInverter¶
- class contextual_encoders.inverter.SqrtInverter¶
Measure¶
A Gatherer is used to combine a set of pairwise attribute measures to a single measure.
Note
If a measure can handle multiple values, a Gatherer is not needed.
- class contextual_encoders.measure.DissimilarityMeasure(symmetric, multiple_values, verbose=False)¶
Bases:
contextual_encoders.measure.Measure,abc.ABC- __init__(symmetric, multiple_values, verbose=False)¶
Initialize self. See help(type(self)) for accurate signature.
- class contextual_encoders.measure.Measure(symmetric, multiple_values, verbose=False)¶
Bases:
abc.ABC- __init__(symmetric, multiple_values, verbose=False)¶
Initialize self. See help(type(self)) for accurate signature.
- class contextual_encoders.measure.PathLengthMeasure(context, verbose=False)¶
Bases:
contextual_encoders.measure.SimilarityMeasure- __init__(context, verbose=False)¶
Initialize self. See help(type(self)) for accurate signature.
- class contextual_encoders.measure.SimilarityMeasure(symmetric, multiple_values, verbose=False)¶
Bases:
contextual_encoders.measure.Measure,abc.ABC- __init__(symmetric, multiple_values, verbose=False)¶
Initialize self. See help(type(self)) for accurate signature.
- class contextual_encoders.measure.WuPalmer(context, offset=0.0, verbose=False)¶
Bases:
contextual_encoders.measure.SimilarityMeasure- __init__(context, offset=0.0, verbose=False)¶
Initialize self. See help(type(self)) for accurate signature.
Reducer¶
A Gatherer is used to combine a set of pairwise attribute measures to a single measure.
Note
If a measure can handle multiple values, a Gatherer is not needed.
- class contextual_encoders.reducer.MultidimensionalScalingReducer(n_components, metric)¶
Bases:
contextual_encoders.reducer.Reducer- __init__(n_components, metric)¶
Initialize self. See help(type(self)) for accurate signature.