opfython.models¶

Each machine learning OPF-based technique is defined in this package. From Supervised OPF to Unsupervised OPF, you can use whatever suits your needs.

A modeling package for all common opfython modules.

class opfython.models.KNNSupervisedOPF(max_k: Optional[int] = 1, distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)¶

Bases: opfython.core.OPF

A KNNSupervisedOPF which implements the supervised version of OPF classifier with a KNN subgraph.

References

J. P. Papa and A. X. Falcão. A Learning Algorithm for the Optimum-Path Forest Classifier. Graph-Based Representations in Pattern Recognition (2009).

__init__(self, max_k: Optional[int] = 1, distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)¶

Initialization method.

Parameters

max_k – Maximum k value for cutting the subgraph.
distance – An indicator of the distance metric to be used.
pre_computed_distance – A pre-computed distance file for feeding into OPF.

_clustering(self, force_prototype: Optional[bool] = False)¶

Clusters the subgraph.

Parameters: force_prototype – Whether clustering should for each class to have at least one prototype.

_learn(self, X_train: numpy.array, Y_train: numpy.array, I_train: numpy.array, X_val: numpy.array, Y_val: numpy.array, I_val: numpy.array)¶

Learns the best k value over the validation set.

Parameters

X_train – Array of training features.
Y_train – Array of training labels.
I_train – Array of training indexes.
X_val – Array of validation features.
Y_val – Array of validation labels.
I_val – Array of validation indexes.

fit(self, X_train: numpy.array, Y_train: numpy.array, X_val: numpy.array, Y_val: numpy.array, I_train: Optional[numpy.array] = None, I_val: Optional[numpy.array] = None)¶

Fits data in the classifier.

Parameters

X_train – Array of training features.
Y_train – Array of training labels.
X_val – Array of validation features.
Y_val – Array of validation labels.
I_train – Array of training indexes.
I_val – Array of validation indexes.

property max_k(self)¶: Maximum k value for cutting the subgraph.

predict(self, X_test: numpy.array, I_test: Optional[numpy.array] = None)¶

Predicts new data using the pre-trained classifier.

Parameters

X_test – Array of features.
I_test – Array of indexes.

Returns

A list of predictions for each record of the data.

Return type

(List[int])

class opfython.models.SupervisedOPF(distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)¶

Bases: opfython.core.OPF

A SupervisedOPF which implements the supervised version of OPF classifier.

References

J. P. Papa, A. X. Falcão and C. T. N. Suzuki. Supervised Pattern Classification based on Optimum-Path Forest. International Journal of Imaging Systems and Technology (2009).

__init__(self, distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)¶

Initialization method.

Parameters

distance – An indicator of the distance metric to be used.
pre_computed_distance – A pre-computed distance file for feeding into OPF.

_find_prototypes(self)¶: Find prototype nodes using the Minimum Spanning Tree (MST) approach.

fit(self, X_train: numpy.array, Y_train: numpy.array, I_train: Optional[numpy.array] = None)¶

Fits data in the classifier.

Parameters

X_train – Array of training features.
Y_train – Array of training labels.
I_train – Array of training indexes.

learn(self, X_train: numpy.array, Y_train: numpy.array, X_val: numpy.array, Y_val: numpy.array, n_iterations: Optional[int] = 10)¶

Learns the best classifier over a validation set.

Parameters

X_train – Array of training features.
Y_train – Array of training labels.
X_val – Array of validation features.
Y_val – Array of validation labels.
n_iterations – Number of iterations.

predict(self, X_val: numpy.array, I_val: Optional[numpy.array] = None)¶

Predicts new data using the pre-trained classifier.

Parameters

X_val – Array of validation or test features.
I_val – Array of validation or test indexes.

Returns

A list of predictions for each record of the data.

Return type

(List[int])

prune(self, X_train: numpy.array, Y_train: numpy.array, X_val: numpy.array, Y_val: numpy.array, n_iterations: Optional[int] = 10)¶

Prunes a classifier over a validation set.

Parameters

X_train – Array of training features.
Y_train – Array of training labels.
X_val – Array of validation features.
Y_val – Array of validation labels.
n_iterations – Maximum number of iterations.

class opfython.models.UnsupervisedOPF(min_k: Optional[int] = 1, max_k: Optional[int] = 1, distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)¶

Bases: opfython.core.OPF

An UnsupervisedOPF which implements the unsupervised version of OPF classifier.

References

L. M. Rocha, F. A. M. Cappabianco, A. X. Falcão. Data clustering as an optimum-path forest problem with applications in image analysis. International Journal of Imaging Systems and Technology (2009).

__init__(self, min_k: Optional[int] = 1, max_k: Optional[int] = 1, distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)¶

Initialization method.

Parameters

min_k – Minimum k value for cutting the subgraph.
max_k – Maximum k value for cutting the subgraph.
distance – An indicator of the distance metric to be used.
pre_computed_distance – A pre-computed distance file for feeding into OPF.

_best_minimum_cut(self, min_k: int, max_k: int)¶

Performs a minimum cut on the subgraph using the best k value.

Parameters

min_k – Minimum value of k.
max_k – Maximum value of k.

_clustering(self, n_neighbours: int)¶

Clusters the subgraph using using a k value (number of neighbours).

Parameters: n_neighbours – Number of neighbours to be used.

_normalized_cut(self, n_neighbours: int)¶

Performs a normalized cut over the subgraph using a k value (number of neighbours).

Parameters: n_neighbours – Number of neighbours to be used.
Returns: The value of the normalized cut.
Return type: (int)

fit(self, X_train: numpy.array, Y_train: Optional[numpy.array] = None, I_train: Optional[numpy.array] = None)¶

Fits data in the classifier.

Parameters

X_train – Array of training features.
Y_train – Array of training labels.
I_train – Array of training indexes.

property max_k(self)¶: Maximum k value for cutting the subgraph.

property min_k(self)¶: Minimum k value for cutting the subgraph.

predict(self, X_val: numpy.array, I_val: Optional[numpy.array] = None)¶

Predicts new data using the pre-trained classifier.

Parameters

X_val – Array of validation features.
I_val – Array of validation indexes.

Returns

A list of predictions for each record of the data.

Return type

(List[int])

propagate_labels(self)¶: Runs through the clusters and propagate the clusters roots labels to the samples.