Data Exploration and Data Cleaning

Data Exploration and Data Cleaning

Data Exploration and Data Cleaning

Data exploration and data cleaning form the base of every AI workflow. Clean data leads to stronger models. Exploration helps you understand structure, errors, and patterns.

What is data exploration

Data exploration checks shape, types, missing values, duplicates, and simple stats. It gives a first view of the dataset.

Key steps

  • Check dataset size.
  • Check column types.
  • Check missing values.
  • Check unique values for each column.
  • Check simple statistics like mean and median.
  • Look for outliers.

Basic Python example

import pandas as pd

df = pd.read_csv("data.csv")

print(df.head())
print(df.info())
print(df.describe())
print(df.isnull().sum())
print(df.duplicated().sum())

What is data cleaning

Data cleaning fixes errors and prepares the dataset. It includes removing duplicates, filling missing values, converting types, and handling outliers.

Common cleaning actions

  • Remove duplicates.
  • Drop irrelevant columns.
  • Fill missing values.
  • Convert string numbers to numeric types.
  • Standardize text values.
  • Handle outliers.

Cleaning example in Python

# Remove duplicates
df = df.drop_duplicates()

# Fill missing values
df["age"] = df["age"].fillna(df["age"].median())

# Convert to numeric
df["salary"] = pd.to_numeric(df["salary"], errors="coerce")

# Drop rows with invalid values
df = df.dropna()

Tips for strong preprocessing

  • Keep transformations simple.
  • Keep logs of steps.
  • Use clear column names.
  • Test the dataset after each change.
  • Store a clean version for modeling.

Conclusion

Data exploration shows structure. Data cleaning fixes it. These steps improve AI models and reduce errors. Every AI student and practitioner needs strong preprocessing skills.


Data Exploration w Data Cleaning b Darija

Data exploration w data cleaning houma l9a3da dyal kol workflow f AI. Data n9iya katsayeb models aqwa. Exploration kayb9a awwel fase bach tfham dataset.

Ash hiya data exploration

Katchouf shape, types, missing values, duplicates, w stats basita. Katt3tik nazra 3amma 3la dataset.

Steps mhemmin

  • Chouf size dyal dataset.
  • Chouf types dyal columns.
  • Chouf missing values.
  • Chouf unique values.
  • Chouf stats basita b7al mean w median.
  • Chouf outliers.

Exemple b Python

import pandas as pd

df = pd.read_csv("data.csv")

print(df.head())
print(df.info())
print(df.describe())
print(df.isnull().sum())
print(df.duplicated().sum())

Ash hiya data cleaning

Data cleaning katssayeb errors w katwajjid dataset. Katchmel delete duplicates, fill missing values, convert types, w handle outliers.

Steps dyal cleaning

  • Tri duplicates.
  • Hayed columns li mafihom faida.
  • 3mer missing values.
  • Convert types.
  • Standardize text.
  • Handle outliers.

Exemple cleaning b Python

df = df.drop_duplicates()

df["age"] = df["age"].fillna(df["age"].median())

df["salary"] = pd.to_numeric(df["salary"], errors="coerce")

df = df.dropna()

Tips

  • Khdem b steps s7lin.
  • Sjjel kull step.
  • Smi columns b klam wadi.
  • T9der ttesti dataset b3d kull taghyir.
  • Khlli version n9iya bach tbuildi models.

Khitam

Data exploration katfham dataset. Data cleaning katsla7 dataset. Had lfaslat kayrfa3o quality dyal models f AI. Talaba w practitioners kay7tajouhm f ay project.

Share:

Ai With Darija

Discover expert tutorials, guides, and projects in machine learning, deep learning, AI, and large language models . start learning to boot your carrer growth in IT تعرّف على دروس وتوتوريالات ، ومشاريع فـ الماشين ليرنين، الديب ليرنين، الذكاء الاصطناعي، والنماذج اللغوية الكبيرة. بّدا التعلّم باش تزيد تقدم فـ المسار ديالك فـ مجال المعلومات.

Blog Archive