Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

DBSCAN Class

Signature

model = sp.DBSCAN(eps: float = 0.5, min_samples: int = 5)

model.fit(x: list[float], y: list[float]) -> None
model.fit_predict(x: list[float], y: list[float]) -> list[int]

model.labels_      -> list[int]
model.n_clusters_  -> int
model.n_noise_     -> int

Description

Low-level DBSCAN class for programmatic access to cluster labels. -1 labels indicate noise points (not part of any cluster).


Constructor Parameters

ParameterTypeDefaultDescription
epsfloat0.5Neighborhood distance threshold
min_samplesint5Minimum points to form a cluster core

Methods

fit(x, y)

Runs DBSCAN on the 2D data. Populates labels_, n_clusters_, and n_noise_.

ArgumentTypeDescription
xlist[float]X coordinates
ylist[float]Y coordinates

fit_predict(x, y) -> list[int]

Equivalent to calling fit(x, y) then returning labels_.


Attributes

AttributeTypeDescription
labels_list[int]Cluster label per point (-1 = noise)
n_clusters_intNumber of identified clusters
n_noise_intNumber of noise points

Examples

Accessing labels

from seraplot import DBSCAN
import seraplot as sp
x = [1.0, 1.1, 1.2, 10.0, 10.1, 99.0]
y = [1.0, 0.9, 1.1, 10.2, 10.0, 99.0]
xy = [[xi, yi] for xi, yi in zip(x, y)]
model = DBSCAN(eps=0.5, min_samples=2)
labels = model.fit_predict(xy)
chart = sp.build_scatter_chart(
    f"DBSCAN ({model.n_clusters_} clusters)",
    x_values=x,
    y_values=y,
    color_groups=[str(lbl) for lbl in labels],
)
from seraplot import DBSCAN
const sp = require('seraplot');
const x = [1.0, 1.1, 1.2, 10.0, 10.1, 99.0]
const y = [1.0, 0.9, 1.1, 10.2, 10.0, 99.0]
const xy = [[xi, yi] for xi, yi in zip(x, y)]
const model = DBSCAN({eps: 0.5, min_samples: 2})
const labels = model.fit_predict(xy)
const chart = sp.build_scatter_chart(f"DBSCAN ({model.n_clusters_} clusters)",
x,
{
    y_values: y,
    color_groups: [str(lbl) for lbl in labels]
})
from seraplot import DBSCAN
import * as sp from 'seraplot';
const x: number[] = [1.0, 1.1, 1.2, 10.0, 10.1, 99.0]
const y: number[] = [1.0, 0.9, 1.1, 10.2, 10.0, 99.0]
const xy: number[] = [[xi, yi] for xi, yi in zip(x, y)]
const model = DBSCAN({eps: 0.5, min_samples: 2})
const labels = model.fit_predict(xy)
const chart = sp.build_scatter_chart(f"DBSCAN ({model.n_clusters_} clusters)",
x,
{
    y_values: y,
    color_groups: [str(lbl) for lbl in labels]
})
▶ Live Preview

Pipeline: cluster then visualize

import seraplot as sp

model = sp.DBSCAN(eps=1.0, min_samples=5)
model.fit(x_data, y_data)

color_groups = [str(lbl) for lbl in model.labels_]

chart = sp.build_scatter_chart(
    f"DBSCAN ({model.n_clusters_} clusters)",
    x_values=x_data,
    y_values=y_data,
    color_groups=color_groups,
)

Algorithmic Functioning

The DBSCAN class exposes the same Rust-backed algorithm as the chart variant.

For a point $p$, its $\epsilon$-neighbourhood is:

$$N_\epsilon(p) = \{q \in D : \|p - q\| \leq \epsilon\}$$
  • Core point: $|N_\epsilon(p)| \geq \text{min_samples}$
  • Border point: reachable from a core point but not itself a core point
  • Noise point: not reachable from any core point — label $-1$

SeraPlot builds a KD-tree for $O(\log n)$ radius queries and expands clusters via parallel BFS with SIMD distance acceleration. n_clusters_ counts only true clusters; noise points are excluded.


See also

Description

Classe DBSCAN bas niveau pour un accès programmatique aux labels de cluster. Les points bruit ont le label -1.

Constructeur

ParamètreTypeDéfautDescription
epsfloat0.5Distance maximale de voisinage
min_samplesint5Nombre minimum de points pour une région dense

Méthodes

MéthodeDescription
fit(x, y)Ajuste le modèle
fit_predict(x, y)Ajuste et retourne les labels

Attributs

AttributDescription
labels_Liste des labels par point (−1 = bruit)
n_clusters_Nombre de clusters trouvés
n_noise_Nombre de points bruit

Fonctionnement algorithmique

La classe DBSCAN expose le même algorithme Rust que la variante graphique.

Pour un point $p$, son $\epsilon$-voisinage est :

$$N_\epsilon(p) = \{q \in D : \|p - q\| \leq \epsilon\}$$
  • Point cœur : $|N_\epsilon(p)| \geq \text{min_samples}$
  • Point frontière : accessible depuis un point cœur, mais pas lui-même un point cœur
  • Point bruit : non accessible depuis aucun point cœur — label $-1$

SeraPlot construit un KD-tree pour des requêtes de rayon en $O(\log n)$ et étend les clusters par BFS parallèle avec accélération SIMD. n_clusters_ ne compte que les vrais clusters ; les points bruit en sont exclus.