Clustering in Machine Learning

Introduction

Clustering is an unsupervised learning method. It groups similar data points without labels. Next, you see how it works and how each algorithm forms clusters.

Clustering هو طريقة ف unsupervised learning. كيجمع points لي كيشابهو بعضياتهم بلا labels.

Core Concepts Explained

Clustering searches for structure. The algorithm checks similarity and creates groups. Each group contains points that stay close to each other.

Clustering كيشوف similarity و كيدير grouping بشكل تلقائي.

How Clustering Works

You provide unlabeled data
The algorithm measures similarity
It forms clusters based on distance or density
Points in the same cluster stay close

Popular Clustering Algorithms

1. K Means

K Means splits data into K clusters. You choose K. The algorithm places centers and assigns points to the closest center. It updates centers until movement becomes small.

Best For

Large datasets
Simple cluster shapes

2. Hierarchical Clustering

This algorithm builds a hierarchy of clusters. It merges or splits clusters step by step. You cut the tree at the level you want.

Best For

Small or medium datasets
Flexible clusters

3. DBSCAN

DBSCAN groups points based on density. It detects dense regions and marks low density points as noise.

Best For

Data with noise
Irregular cluster shapes

Distance Measures

Euclidean distance
Manhattan distance
Cosine similarity

Challenges in Clustering

Selecting the number of clusters
Handling noisy data
Scaling features

Improving Clustering Results

Normalize features
Apply dimensionality reduction
Test different K or density parameters

Where Clustering Is Used

Customer segmentation
Anomaly detection
Document grouping
Image grouping

Syntax or Model Structure Example

Below is a Python example for K Means.

from sklearn.cluster import KMeans
import pandas as pd

data = pd.read_csv("data.csv")
X = data[["f1", "f2"]]

model = KMeans(n_clusters=3)
model.fit(X)

print(model.labels_)
print(model.cluster_centers_)

هادا مثال بسيط كيبين كيفاش نخدمو K Means ف sklearn.

Clustering in Moroccan Darija

Clustering كيجمع data f clusters بلا labels. Algorithm كيحسب similarity و كيحط كل point ف group اللي قريبة ليه.

K Means

K Means كيحدد K clusters. كيحسب centers و كيعيد التوزيع.

Hierarchical

Kaybni tree ديال clusters. تقدر تقطعو ف أي مستوى.

DBSCAN

Kaylqa regions فيها density عالية و كيعرف noise بلا صعوبة.

Nqat Sariha

Scaling ضروري
اختيار K كيحتاج تجريب
Dimensionality reduction كيعاون بزاف

Multiple Practical Examples

1. K Means with 3 Clusters

model = KMeans(n_clusters=3)
model.fit(X)
print(model.labels_[:10])

2. DBSCAN Example


from sklearn.cluster import DBSCAN
db = DBSCAN(eps=0.5, min_samples=5)
labels = db.fit_predict(X)
print(labels[:10])

Explanation of Each Example

The first example clusters data into fixed groups. The second example detects dense regions and marks noise when needed.

ف المثال الأول كنحددو العدد ديال clusters. ف الثاني algorithm كيعتمد على density.

Exercises

Explain clustering in one sentence.
Train a K Means model with three clusters.
Plot cluster centers on a scatter plot.
Train a DBSCAN model and detect noise.
Use MinMaxScaler before clustering.
Try different K values and compare results.
Use PCA before clustering and check improvements.
List two strengths of clustering.
List two challenges in clustering.
Create clusters from synthetic data using sklearn.

Conclusion

Clustering groups data by similarity. It reveals structure that helps in analytics and AI workflows. It works well with scaling and careful parameter selection.

Clustering كيساعدك تشوف structure مخبية ف data. خاصو scaling و اختيار parameters باش يعطي نتائج واضحة.

Ai With Darija