Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Machine Learning

SeraPlot provides a complete scikit-learn-compatible ML framework written in Rust with Python bindings. All models follow the same fit / predict / score API.

All models are faster than scikit-learn on equivalent tasks, with 1.3× to 686× speedups depending on the algorithm.


Quick Start

import seraplot as sp
import numpy as np

X_train, X_test, y_train, y_test = sp.train_test_split(X, y, test_size=0.2)

model = sp.GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")
print(f"Classes: {model.classes_}")
proba = model.predict_proba(X_test)

Model Index

Supervised — Linear Models

ClassTaskDescription
LinearRegressionRegressionOrdinary least squares
RidgeRegressionL2-regularized OLS (Cholesky solver)
RidgeClassifierClassificationRidge regression rounded to nearest class
LassoRegressionL1-regularized (coordinate descent)
ElasticNetRegressionL1 + L2 combined (coordinate descent)
LogisticRegressionClassificationNewton-Raphson with full joint Hessian + line search
SGDClassifierClassificationStochastic gradient descent (hinge / log / huber)
SGDRegressorRegressionStochastic gradient descent (squared loss)

Supervised — Tree-Based

ClassTaskDescription
DecisionTreeClassifierClassificationCART with Gini or Entropy criterion
DecisionTreeRegressorRegressionCART with MSE criterion
RandomForestClassifierClassificationBagged trees with feature subsampling
RandomForestRegressorRegressionBagged trees with feature subsampling
GradientBoostingClassifierClassificationSoftmax boosting with Newton-Raphson leaf values
GradientBoostingRegressorRegressionResidual boosting with shrinkage
AdaBoostClassifierClassificationSAMME.R with Laplace-smoothed probabilities
AdaBoostRegressorRegressionWeighted median AdaBoost.R2

Supervised — Neighbors

ClassTaskDescription
KNeighborsClassifierClassificationBrute-force KNN with thread-local buffers
KNeighborsRegressorRegressionKNN with uniform or distance weighting
NearestCentroidClassificationClassify by nearest class centroid

Supervised — Naive Bayes

ClassTaskDescription
GaussianNBClassificationGaussian likelihood per feature
MultinomialNBClassificationCount/frequency features
BernoulliNBClassificationBinary features with binarization threshold

Supervised — SVM

ClassTaskDescription
LinearSVCClassificationDual coordinate descent hinge loss
LinearSVRRegressionEpsilon-insensitive loss

Unsupervised — Clustering

ClassTaskDescription
KMeansClusteringLloyd's algorithm with k-means++ init
DBSCANClusteringDensity-based spatial clustering

Preprocessing

ClassDescription
StandardScalerZero mean, unit variance
MinMaxScalerScale to [min, max] range
RobustScalerMedian/IQR scaling (outlier-robust)
MaxAbsScalerScale by max absolute value
NormalizerRow-wise L1/L2/Max normalization

Decomposition

ClassDescription
PCAPrincipal Component Analysis
TruncatedSVDTruncated SVD (no centering)

Evaluation

FunctionDescription
accuracy_scoreClassification accuracy
mean_squared_errorMSE for regression
mean_absolute_errorMAE for regression
r2_scoreCoefficient of determination
classification_reportPer-class precision/recall/f1
train_test_splitStratified train/test split

Model Selection

ClassDescription
GridSearchCVExhaustive grid search with cross-validation
RandomizedSearchCVRandom search with cross-validation
HalvingGridSearchCVGrid search with successive halving
HalvingRandomSearchCVRandom search with successive halving

Common API

All supervised models implement:

model.fit(X, y)                 # Train on data
model.predict(X) -> list        # Predict labels/values
model.score(X, y) -> float      # Accuracy (clf) or R² (reg)

Classifiers additionally provide:

model.predict_proba(X) -> ndarray   # Class probabilities (n, n_classes)
model.classes_ -> list[int]          # Unique sorted class labels

Linear models expose:

model.coef_ -> list[float] | ndarray
model.intercept_ -> float | ndarray

Benchmarks vs scikit-learn

ModelSpeedupNotes
GradientBoosting55×Newton-Raphson leaf values
RandomForest4–14×Rayon parallel tree building
AdaBoost6.7×SAMME.R with Laplace smoothing
DecisionTreeOptimized column-major splitting
GaussianNB4.5×SIMD-friendly log-likelihood
LinearSVC3.3×Dual coordinate descent
KNN1.3×Thread-local zero-alloc buffers
LogisticRegression1.2×Full joint Hessian Newton
Pipeline (10 classes)8.3×Digits dataset end-to-end
GridSearch Ridge15×Direct Cholesky solver
GridSearch Lasso418×Gram cache + coordinate descent
GridSearch ElasticNet686×Gram cache + coordinate descent
GridSearch LogReg42×IRLS fast path
GridSearch KNN119×Distance matrix cache
GridSearch RF14×Parallel tree building
GridSearch GB14×Parallel boosting

SeraPlot fournit un framework ML complet, compatible scikit-learn, écrit en Rust avec des liaisons Python. Tous les modèles respectent la même API fit / predict / score.

Tous les modèles sont plus rapides que scikit-learn sur des tâches équivalentes, avec des accélérations de 1,3× à 686× selon l'algorithme.


Démarrage rapide

import seraplot as sp
import numpy as np

X_train, X_test, y_train, y_test = sp.train_test_split(X, y, test_size=0.2)

model = sp.GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
model.fit(X_train, y_train)
print(f"Précision : {model.score(X_test, y_test):.4f}")

API commune

Tous les modèles supervisés implémentent :

model.fit(X, y)              # Entraîner sur les données
model.predict(X) -> list     # Prédire les étiquettes/valeurs
model.score(X, y) -> float   # Précision (clf) ou R² (rég)

Performances vs scikit-learn

ModèleAccélérationNotes
GradientBoosting55×Valeurs feuilles Newton-Raphson
RandomForest4–14×Construction d'arbres parallèle
AdaBoost6,7×SAMME.R avec lissage de Laplace
DecisionTreeDivision colonne-majeure optimisée
GaussianNB4,5×Log-vraisemblance optimisée SIMD
LinearSVC3,3×Descente de coordonnées duale
KNN1,3×Tampons thread-local sans allocation