Machine Learning

SeraPlot provides a complete scikit-learn-compatible ML framework written in Rust with Python bindings. All models follow the same fit / predict / score API.

All models are faster than scikit-learn on equivalent tasks, with 1.3× to 686× speedups depending on the algorithm.

Quick Start

import seraplot as sp
import numpy as np

X_train, X_test, y_train, y_test = sp.train_test_split(X, y, test_size=0.2)

model = sp.GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")
print(f"Classes: {model.classes_}")
proba = model.predict_proba(X_test)

Model Index

Supervised — Linear Models

Class	Task	Description
`LinearRegression`	Regression	Ordinary least squares
`Ridge`	Regression	L2-regularized OLS (Cholesky solver)
`RidgeClassifier`	Classification	Ridge regression rounded to nearest class
`Lasso`	Regression	L1-regularized (coordinate descent)
`ElasticNet`	Regression	L1 + L2 combined (coordinate descent)
`LogisticRegression`	Classification	Newton-Raphson with full joint Hessian + line search
`SGDClassifier`	Classification	Stochastic gradient descent (hinge / log / huber)
`SGDRegressor`	Regression	Stochastic gradient descent (squared loss)

Supervised — Tree-Based

Class	Task	Description
`DecisionTreeClassifier`	Classification	CART with Gini or Entropy criterion
`DecisionTreeRegressor`	Regression	CART with MSE criterion
`RandomForestClassifier`	Classification	Bagged trees with feature subsampling
`RandomForestRegressor`	Regression	Bagged trees with feature subsampling
`GradientBoostingClassifier`	Classification	Softmax boosting with Newton-Raphson leaf values
`GradientBoostingRegressor`	Regression	Residual boosting with shrinkage
`AdaBoostClassifier`	Classification	SAMME.R with Laplace-smoothed probabilities
`AdaBoostRegressor`	Regression	Weighted median AdaBoost.R2

Supervised — Neighbors

Class	Task	Description
`KNeighborsClassifier`	Classification	Brute-force KNN with thread-local buffers
`KNeighborsRegressor`	Regression	KNN with uniform or distance weighting
`NearestCentroid`	Classification	Classify by nearest class centroid

Supervised — Naive Bayes

Class	Task	Description
`GaussianNB`	Classification	Gaussian likelihood per feature
`MultinomialNB`	Classification	Count/frequency features
`BernoulliNB`	Classification	Binary features with binarization threshold

Supervised — SVM

Class	Task	Description
`LinearSVC`	Classification	Dual coordinate descent hinge loss
`LinearSVR`	Regression	Epsilon-insensitive loss

Unsupervised — Clustering

Class	Task	Description
`KMeans`	Clustering	Lloyd's algorithm with k-means++ init
`DBSCAN`	Clustering	Density-based spatial clustering

Preprocessing

Class	Description
`StandardScaler`	Zero mean, unit variance
`MinMaxScaler`	Scale to [min, max] range
`RobustScaler`	Median/IQR scaling (outlier-robust)
`MaxAbsScaler`	Scale by max absolute value
`Normalizer`	Row-wise L1/L2/Max normalization

Decomposition

Class	Description
`PCA`	Principal Component Analysis
`TruncatedSVD`	Truncated SVD (no centering)

Evaluation

Function	Description
`accuracy_score`	Classification accuracy
`mean_squared_error`	MSE for regression
`mean_absolute_error`	MAE for regression
`r2_score`	Coefficient of determination
`classification_report`	Per-class precision/recall/f1
`train_test_split`	Stratified train/test split

Model Selection

Class	Description
`GridSearchCV`	Exhaustive grid search with cross-validation
`RandomizedSearchCV`	Random search with cross-validation
`HalvingGridSearchCV`	Grid search with successive halving
`HalvingRandomSearchCV`	Random search with successive halving

Common API

All supervised models implement:

model.fit(X, y)                 # Train on data
model.predict(X) -> list        # Predict labels/values
model.score(X, y) -> float      # Accuracy (clf) or R² (reg)

Classifiers additionally provide:

model.predict_proba(X) -> ndarray   # Class probabilities (n, n_classes)
model.classes_ -> list[int]          # Unique sorted class labels

Linear models expose:

model.coef_ -> list[float] | ndarray
model.intercept_ -> float | ndarray

Benchmarks vs scikit-learn

Model	Speedup	Notes
GradientBoosting	55×	Newton-Raphson leaf values
RandomForest	4–14×	Rayon parallel tree building
AdaBoost	6.7×	SAMME.R with Laplace smoothing
DecisionTree	6×	Optimized column-major splitting
GaussianNB	4.5×	SIMD-friendly log-likelihood
LinearSVC	3.3×	Dual coordinate descent
KNN	1.3×	Thread-local zero-alloc buffers
LogisticRegression	1.2×	Full joint Hessian Newton
Pipeline (10 classes)	8.3×	Digits dataset end-to-end
GridSearch Ridge	15×	Direct Cholesky solver
GridSearch Lasso	418×	Gram cache + coordinate descent
GridSearch ElasticNet	686×	Gram cache + coordinate descent
GridSearch LogReg	42×	IRLS fast path
GridSearch KNN	119×	Distance matrix cache
GridSearch RF	14×	Parallel tree building
GridSearch GB	14×	Parallel boosting

SeraPlot fournit un framework ML complet, compatible scikit-learn, écrit en Rust avec des liaisons Python. Tous les modèles respectent la même API fit / predict / score.

Tous les modèles sont plus rapides que scikit-learn sur des tâches équivalentes, avec des accélérations de 1,3× à 686× selon l'algorithme.

Démarrage rapide

import seraplot as sp
import numpy as np

X_train, X_test, y_train, y_test = sp.train_test_split(X, y, test_size=0.2)

model = sp.GradientBoostingClassifier(n_estimators=100, learning_rate=0.1)
model.fit(X_train, y_train)
print(f"Précision : {model.score(X_test, y_test):.4f}")

API commune

Tous les modèles supervisés implémentent :

model.fit(X, y)              # Entraîner sur les données
model.predict(X) -> list     # Prédire les étiquettes/valeurs
model.score(X, y) -> float   # Précision (clf) ou R² (rég)

Performances vs scikit-learn

Modèle	Accélération	Notes
GradientBoosting	55×	Valeurs feuilles Newton-Raphson
RandomForest	4–14×	Construction d'arbres parallèle
AdaBoost	6,7×	SAMME.R avec lissage de Laplace
DecisionTree	6×	Division colonne-majeure optimisée
GaussianNB	4,5×	Log-vraisemblance optimisée SIMD
LinearSVC	3,3×	Descente de coordonnées duale
KNN	1,3×	Tampons thread-local sans allocation

Keyboard shortcuts

SeraPlot Documentation