Machine Learning Data Preparation


What is Machine Learning Data Preparation?

Before any intelligent system learns, the information it receives must be polished, transformed, and structured. This phase is the foundation of every successful model.


Collecting Information

Sourcing entries from spreadsheets, web forms, sensors, or public records.

Example: Extracting weather updates from an online climate API.


Cleaning Up

Removing incorrect entries, fixing typos, and handling empty fields.

Example: Deleting rows with null transaction IDs.


Rescaling Values

Adjusting numeric figures to a common scale to avoid uneven influence.

Example: Transforming income from ₹100–₹10,000 to a 0–1 range.


Rewriting Categories

Converting text labels into machine-readable formats.

Example: “Low”, “Medium”, “High” → 0, 1, 2 for risk levels.


Dividing the Dataset

Breaking up the full data into separate portions for learning and verifying.

Example: Use 70% for training and the rest for testing the outcome.


Fixing Imbalance

Adjusting uneven label distributions to avoid bias in learning.

Example: Oversampling “yes” responses in a feedback dataset.


Randomizing Order

Shuffling data points to eliminate patterns from arrangement.

Example: Mixing rows of survey results before feeding into the model.


Creating More Variations

Enhancing datasets with artificial but realistic changes.

Example: Rotating images of faces to increase variety.


Filling Gaps

Estimating and inserting likely values in missing spaces.

Example: Using the average age when someone's age is missing.


Trimming Extras

Eliminating columns or values that add no real insight.

Example: Removing a serial number column from training data.


Prefer Learning by Watching?

Watch these YouTube tutorials to understand CYBERSECURITY Tutorial visually:

What You'll Learn:
  • 📌 How is data prepared for machine learning?
  • 📌 What is Data Preparation?
Previous Next