API Reference

This is the API Reference of the Python library Contextual Encoders.

Note

When the dependencies got updated, an export from poetry needs to be done to update the requirements.txt within the doc directory: poetry export –dev -f requirements.txt –output requirements.txt.

Aggregator

Aggregators are used to combine multiple matrices to a single matrix. This is used to combine similarity or dissimilarity matrices of multiple attributes to a single one. Thus, an aggregator \(\mathcal{A}\) is a mapping of the form \(\mathcal{A} : \mathbb{R}^{n \times n \times m} \rightarrow \mathbb{R}^{n \times n}\), with \(n\) being the amount of features and \(m\) being the number of similarity or dissimilarity matrices of type \(D \in \mathbb{R}^{n \times n}\).

Currently, the following aggregators are implement:

Name

Formula

Mean

\(\mathcal{A} (D^1, D^2, ..., D^m) = \frac{1}{m} \sum_{i=1}^{m} D^i\)

Median

\(\mathcal{A} (D^1, D^2, ..., D^m) = \left\{ \begin{array}{ll} D^{\frac{m}{2}} & \mbox{, if } m \mbox{ is even} \\ \frac{1}{2} \left( D^{\frac{m-1}{2}} + D^{\frac{m+1}{2}} \right) & \mbox{, if } m \mbox{ is odd} \end{array} \right.\)

Max

\(\mathcal{A} (D^1, D^2, ..., D^m) = max_{ k} \; D_{i,j}^k\)

Min

\(\mathcal{A} (D^1, D^2, ..., D^m) = min_{ k} \; D_{i,j}^k\)

class contextual_encoders.aggregator.Aggregator

Bases: abc.ABC

An abstract base class for aggregators.

abstract aggregate(matrices)

The abstract method that is implemented by the concrete aggregators.

class contextual_encoders.aggregator.AggregatorFactory

Bases: object

The factory class for creating concrete instances of aggregators.

static create(aggregator)

Creates an instance of the given aggregator name.

Parameters

aggregator – The name of the aggregator, which can be mean, median, max or min.

Returns

An instance of the aggregator.

Raises

ValueError – The given aggregator does not exist.

class contextual_encoders.aggregator.MaxAggregator

Bases: contextual_encoders.aggregator.Aggregator

This class aggregates similarity or dissimilarity matrices using the max. Given \(m\) similarity or dissimilarity matrices \(D^i \in \mathbb{R}^{n \times n}\), the MaxAggregator calculates

\(\mathcal{A} (D^1, D^2, ..., D^m) = max_{ k} \; D_{i,j}^k\).

aggregate(matrices)

Calculates the max of all given matrices along the zero axis.

Parameters

matrices – A list of 2D numpy arrays.

Returns

A 2D numpy array.

class contextual_encoders.aggregator.MeanAggregator

Bases: contextual_encoders.aggregator.Aggregator

This class aggregates similarity or dissimilarity matrices using the mean. Given \(m\) similarity or dissimilarity matrices \(D^i \in \mathbb{R}^{n \times n}\), the MeanAggregator calculates

\(\mathcal{A} (D^1, D^2, ..., D^m) = \frac{1}{m} \sum_{i=1}^{m} D^i\).

aggregate(matrices)

Calculates the mean of all given matrices along the zero axis.

Parameters

matrices – A list of 2D numpy arrays.

Returns

A 2D numpy array.

class contextual_encoders.aggregator.MedianAggregator

Bases: contextual_encoders.aggregator.Aggregator

This class aggregates similarity or dissimilarity matrices using the median. Given \(m\) similarity or dissimilarity matrices \(D^i \in \mathbb{R}^{n \times n}\), the MedianAggregator calculates

\(\mathcal{A} (D^1, D^2, ..., D^m) = \left\{ \begin{array}{ll} D^{\frac{m}{2}} & \mbox{, if } m \mbox{ is even} \\ \frac{1}{2} \left( D^{\frac{m-1}{2}} + D^{\frac{m+1}{2}} \right) & \mbox{, if } m \mbox{ is odd} \end{array} \right.\)

aggregate(matrices)

Calculates the median of all given matrices along the zero axis.

Parameters

matrices – A list of 2D numpy arrays.

Returns

A 2D numpy array.

class contextual_encoders.aggregator.MinAggregator

Bases: contextual_encoders.aggregator.Aggregator

This class aggregates similarity or dissimilarity matrices using the min. Given \(m\) similarity or dissimilarity matrices \(D^i \in \mathbb{R}^{n \times n}\), the MinAggregator calculates

\(\mathcal{A} (D^1, D^2, ..., D^m) = min_{ k} \; D_{i,j}^k\).

aggregate(matrices)

Calculates the min of all given matrices along the zero axis.

Parameters

matrices – A list of 2D numpy arrays.

Returns

A 2D numpy array.

MatrixComputer

The MatrixComputer combines the Measure with the Gatherer and calculates the similarity or dissimilarity matrix for one attribute. Thus, the MatrixComputer can be seen as a mapping \(\mathcal{M}: F \rightarrow \mathbb{R}^{n \times n}\), with \(F\) being the feature space and \(n\) the amount of selected features.

class contextual_encoders.computer.MatrixComputer(measure, gatherer, separator_token=',')

Bases: object

The service class to compute similarity or dissimilarity matrices.

__init__(measure, gatherer, separator_token=',')

Initializes the MatrixComputer.

Parameters
  • measure – The instance of the similarity or dissimilarity measure.

  • gatherer – The name of the gatherer. If the measure can handle multiple values, the Identity gatherer will be taken in any way.

  • separator_token – A string for separating categorical variables into multiple values.

compute(data)

Computes the similarity or dissimilarity matrix based on the given data.

Parameters

data – A single pandas series containing the data. Note, that each entry can have multiple values, that are separated with the separator_token.

Returns

A 2D numpy array representing the similarity or dissimilarity matrix.

Context

The Context is the core part of the Contextual Encoders library. It is used to measure the similarity or dissimilarity of attributes. So far, two different Context-Types are implemented: GraphContext and TreeContext. However, it is very likely that custom context needs to be implemented. Therefore, the base classes Context and GraphBasedContext are used, that come with optimized in- and export functions as well as caching.

class contextual_encoders.context.Context(name)

Bases: abc.ABC

The abstract base class for all Context.

__init__(name)

Initializes the Context.

Parameters

name – The name of the Context.

abstract export_to_file(path)

Exports the Context to the given file path.

Parameters

path – The path to export the Context to.

abstract import_from_file(path)

Imports the Context from the given file path.

Parameters

path – The path to import the Context from.

class contextual_encoders.context.GraphBasedContext(name)

Bases: contextual_encoders.context.Context

A base class for all graph based Context.

__init__(name)

Initializes the GraphBasedContext.

Parameters

name – The name of the Context.

draw()

Draws the graph using matplotlib.

export_to_file(path)

Exports the graph to the given file path.

Parameters

path – The path to export the graph to.

get_graph()

Returns the networkx DiGraph instance.

Returns

A networkx DiGraph instance.

import_from_file(path)

Imports the graph from the given file path.

Parameters

path – The path to import the graph from.

class contextual_encoders.context.GraphContext(name)

Bases: contextual_encoders.context.GraphBasedContext

A graph based Context than can be used for graph based measures.

add_concept(node, neighbor=None, weight=1.0)

Adds a new node to the graph. If the neighbor does not exist, it will be added as new node. If the node already exists, the weight will be overwritten.

Parameters
  • node – The name of the node.

  • neighbor – The name of the neighbor node.

  • weight – The wight of the edge between the node and the neighbor.

class contextual_encoders.context.TreeContext(name)

Bases: contextual_encoders.context.GraphBasedContext

A graph based Context than can be used for tree based measures.

add_concept(child, parent=None, weight=1.0)

Adds a new node to the tree, where the name of the context serves as the root node. If the parent does not exist, it will be added as new node. If the parent is None, the root node will serve as the parent. If the node already exists, the weight will be overwritten.

Parameters
  • child – The name of the child node.

  • parent – The name of the parent node.

  • weight – The wight of the edge between the child and the parent.

get_root()

Gets the name of the root, i.e. the name of the context.

Returns

The name of the root.

get_tree()

Returns the networkx DiGraph instance.

Returns

A networkx DiGraph representing the tree.

ContextualEncoder

The ContextualEncoder is the actual interface for using the Contextual Encoders library. It is used to perform the contextual encoding of a given dataset. Moreover, it inherits from the scikit-learn BaseEstimator and TransformerMixin types and thus enable being used in scikit-learn Pipelines.

class contextual_encoders.encoder.ContextualEncoder(measures, cols=None, inverter='lin', gatherer='smm', aggregator='mean', reducer='mds', **kwargs)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

__init__(measures, cols=None, inverter='lin', gatherer='smm', aggregator='mean', reducer='mds', **kwargs)

Initializes the ContextualEncoder.

Parameters
  • measures – A measure.

  • cols – Pandas columns.

  • inverter – The inverter

  • gatherer – The gatherer.

  • aggregator – The aggregator.

  • reducer – The reducer.

  • kwargs – Additional keywords.

fit_transform(x, y=None, **fit_params)

Encodes the given data.

Parameters
  • x – The data as numpy array, pandas dataframe or python list format.

  • y – TBA.

  • fit_params – TBA.

Returns

The transformed data.

get_dissimilarity_matrix()

Gets the dissimilarity matrix.

Returns

The dissimilarity matrix as 2D numpy array.

get_similarity_matrix()

Gets the similarity matrix.

Returns

The similarity matrix as 2D numpy array.

Gatherer

A Gatherer is used to combine a set of pairwise attribute measures to a single measure.

Note

If a measure can handle multiple values, a Gatherer is not needed.

class contextual_encoders.gatherer.FirstValueGatherer

Bases: contextual_encoders.gatherer.Gatherer

A Gatherer only measuring the first values of the attributes.

class contextual_encoders.gatherer.Gatherer

Bases: abc.ABC

The abstract base class of all Gatherer.

__init__()

Initializes the Gatherer.

gather(first, second)

Combines the given attributes.

Note

Multiple values, e.g. comma separated, are exclusively possible.

Parameters
  • first – The value of the first attribute.

  • second – The value of the second attribute.

Returns

The aggregated value.

set_measure(measure)

Sets the measure for the Gatherer.

Parameters

measure – The measure.

class contextual_encoders.gatherer.GathererFactory

Bases: object

The factory class for creating Gatherer.

static create(gatherer_type)

Creates a Gatherer given the type.

Parameters

gatherer_type – The type of the Gatherer which can be id, first or smm.

Returns

The concrete instance of the Gatherer.

class contextual_encoders.gatherer.IdentityGatherer

Bases: contextual_encoders.gatherer.Gatherer

A Gatherer that let the measure decide how to handle multiple values.

class contextual_encoders.gatherer.SymMaxMeanGatherer

Bases: contextual_encoders.gatherer.Gatherer

A symmetrical maximum gatherer implementation.

Inverter

A Gatherer is used to combine a set of pairwise attribute measures to a single measure.

Note

If a measure can handle multiple values, a Gatherer is not needed.

class contextual_encoders.inverter.CosineInverter

Bases: contextual_encoders.inverter.Inverter

class contextual_encoders.inverter.ExponentialInverter

Bases: contextual_encoders.inverter.Inverter

class contextual_encoders.inverter.Inverter

Bases: abc.ABC

class contextual_encoders.inverter.LinearInverter

Bases: contextual_encoders.inverter.Inverter

class contextual_encoders.inverter.SqrtInverter

Bases: contextual_encoders.inverter.Inverter

Measure

A Gatherer is used to combine a set of pairwise attribute measures to a single measure.

Note

If a measure can handle multiple values, a Gatherer is not needed.

class contextual_encoders.measure.DissimilarityMeasure(symmetric, multiple_values, verbose=False)

Bases: contextual_encoders.measure.Measure, abc.ABC

__init__(symmetric, multiple_values, verbose=False)

Initialize self. See help(type(self)) for accurate signature.

class contextual_encoders.measure.Measure(symmetric, multiple_values, verbose=False)

Bases: abc.ABC

__init__(symmetric, multiple_values, verbose=False)

Initialize self. See help(type(self)) for accurate signature.

class contextual_encoders.measure.PathLengthMeasure(context, verbose=False)

Bases: contextual_encoders.measure.SimilarityMeasure

__init__(context, verbose=False)

Initialize self. See help(type(self)) for accurate signature.

class contextual_encoders.measure.SimilarityMeasure(symmetric, multiple_values, verbose=False)

Bases: contextual_encoders.measure.Measure, abc.ABC

__init__(symmetric, multiple_values, verbose=False)

Initialize self. See help(type(self)) for accurate signature.

class contextual_encoders.measure.WuPalmer(context, offset=0.0, verbose=False)

Bases: contextual_encoders.measure.SimilarityMeasure

__init__(context, offset=0.0, verbose=False)

Initialize self. See help(type(self)) for accurate signature.

Reducer

A Gatherer is used to combine a set of pairwise attribute measures to a single measure.

Note

If a measure can handle multiple values, a Gatherer is not needed.

class contextual_encoders.reducer.MultidimensionalScalingReducer(n_components, metric)

Bases: contextual_encoders.reducer.Reducer

__init__(n_components, metric)

Initialize self. See help(type(self)) for accurate signature.

class contextual_encoders.reducer.Reducer(n_components)

Bases: abc.ABC

__init__(n_components)

Initialize self. See help(type(self)) for accurate signature.