opfython.models

Each machine learning OPF-based technique is defined in this package. From Supervised OPF to Unsupervised OPF, you can use whatever suits your needs.

A modeling package for all common opfython modules.

class opfython.models.KNNSupervisedOPF(max_k: Optional[int] = 1, distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)

Bases: opfython.core.OPF

A KNNSupervisedOPF which implements the supervised version of OPF classifier with a KNN subgraph.

References

J. P. Papa and A. X. Falcão. A Learning Algorithm for the Optimum-Path Forest Classifier. Graph-Based Representations in Pattern Recognition (2009).

__init__(self, max_k: Optional[int] = 1, distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)

Initialization method.

Parameters
  • max_k – Maximum k value for cutting the subgraph.

  • distance – An indicator of the distance metric to be used.

  • pre_computed_distance – A pre-computed distance file for feeding into OPF.

_clustering(self, force_prototype: Optional[bool] = False)

Clusters the subgraph.

Parameters

force_prototype – Whether clustering should for each class to have at least one prototype.

_learn(self, X_train: numpy.array, Y_train: numpy.array, I_train: numpy.array, X_val: numpy.array, Y_val: numpy.array, I_val: numpy.array)

Learns the best k value over the validation set.

Parameters
  • X_train – Array of training features.

  • Y_train – Array of training labels.

  • I_train – Array of training indexes.

  • X_val – Array of validation features.

  • Y_val – Array of validation labels.

  • I_val – Array of validation indexes.

fit(self, X_train: numpy.array, Y_train: numpy.array, X_val: numpy.array, Y_val: numpy.array, I_train: Optional[numpy.array] = None, I_val: Optional[numpy.array] = None)

Fits data in the classifier.

Parameters
  • X_train – Array of training features.

  • Y_train – Array of training labels.

  • X_val – Array of validation features.

  • Y_val – Array of validation labels.

  • I_train – Array of training indexes.

  • I_val – Array of validation indexes.

property max_k(self)

Maximum k value for cutting the subgraph.

predict(self, X_test: numpy.array, I_test: Optional[numpy.array] = None)

Predicts new data using the pre-trained classifier.

Parameters
  • X_test – Array of features.

  • I_test – Array of indexes.

Returns

A list of predictions for each record of the data.

Return type

(List[int])

class opfython.models.SupervisedOPF(distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)

Bases: opfython.core.OPF

A SupervisedOPF which implements the supervised version of OPF classifier.

References

J. P. Papa, A. X. Falcão and C. T. N. Suzuki. Supervised Pattern Classification based on Optimum-Path Forest. International Journal of Imaging Systems and Technology (2009).

__init__(self, distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)

Initialization method.

Parameters
  • distance – An indicator of the distance metric to be used.

  • pre_computed_distance – A pre-computed distance file for feeding into OPF.

_find_prototypes(self)

Find prototype nodes using the Minimum Spanning Tree (MST) approach.

fit(self, X_train: numpy.array, Y_train: numpy.array, I_train: Optional[numpy.array] = None)

Fits data in the classifier.

Parameters
  • X_train – Array of training features.

  • Y_train – Array of training labels.

  • I_train – Array of training indexes.

learn(self, X_train: numpy.array, Y_train: numpy.array, X_val: numpy.array, Y_val: numpy.array, n_iterations: Optional[int] = 10)

Learns the best classifier over a validation set.

Parameters
  • X_train – Array of training features.

  • Y_train – Array of training labels.

  • X_val – Array of validation features.

  • Y_val – Array of validation labels.

  • n_iterations – Number of iterations.

predict(self, X_val: numpy.array, I_val: Optional[numpy.array] = None)

Predicts new data using the pre-trained classifier.

Parameters
  • X_val – Array of validation or test features.

  • I_val – Array of validation or test indexes.

Returns

A list of predictions for each record of the data.

Return type

(List[int])

prune(self, X_train: numpy.array, Y_train: numpy.array, X_val: numpy.array, Y_val: numpy.array, n_iterations: Optional[int] = 10)

Prunes a classifier over a validation set.

Parameters
  • X_train – Array of training features.

  • Y_train – Array of training labels.

  • X_val – Array of validation features.

  • Y_val – Array of validation labels.

  • n_iterations – Maximum number of iterations.

class opfython.models.UnsupervisedOPF(min_k: Optional[int] = 1, max_k: Optional[int] = 1, distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)

Bases: opfython.core.OPF

An UnsupervisedOPF which implements the unsupervised version of OPF classifier.

References

L. M. Rocha, F. A. M. Cappabianco, A. X. Falcão. Data clustering as an optimum-path forest problem with applications in image analysis. International Journal of Imaging Systems and Technology (2009).

__init__(self, min_k: Optional[int] = 1, max_k: Optional[int] = 1, distance: Optional[str] = 'log_squared_euclidean', pre_computed_distance: Optional[str] = None)

Initialization method.

Parameters
  • min_k – Minimum k value for cutting the subgraph.

  • max_k – Maximum k value for cutting the subgraph.

  • distance – An indicator of the distance metric to be used.

  • pre_computed_distance – A pre-computed distance file for feeding into OPF.

_best_minimum_cut(self, min_k: int, max_k: int)

Performs a minimum cut on the subgraph using the best k value.

Parameters
  • min_k – Minimum value of k.

  • max_k – Maximum value of k.

_clustering(self, n_neighbours: int)

Clusters the subgraph using using a k value (number of neighbours).

Parameters

n_neighbours – Number of neighbours to be used.

_normalized_cut(self, n_neighbours: int)

Performs a normalized cut over the subgraph using a k value (number of neighbours).

Parameters

n_neighbours – Number of neighbours to be used.

Returns

The value of the normalized cut.

Return type

(int)

fit(self, X_train: numpy.array, Y_train: Optional[numpy.array] = None, I_train: Optional[numpy.array] = None)

Fits data in the classifier.

Parameters
  • X_train – Array of training features.

  • Y_train – Array of training labels.

  • I_train – Array of training indexes.

property max_k(self)

Maximum k value for cutting the subgraph.

property min_k(self)

Minimum k value for cutting the subgraph.

predict(self, X_val: numpy.array, I_val: Optional[numpy.array] = None)

Predicts new data using the pre-trained classifier.

Parameters
  • X_val – Array of validation features.

  • I_val – Array of validation indexes.

Returns

A list of predictions for each record of the data.

Return type

(List[int])

propagate_labels(self)

Runs through the clusters and propagate the clusters roots labels to the samples.