Machine Learning Data Preparation
What is Machine Learning Data Preparation?
Before any intelligent system learns, the information it receives must be polished, transformed, and structured. This phase is the foundation of every successful model.
Collecting Information
Sourcing entries from spreadsheets, web forms, sensors, or public records.
Example: Extracting weather updates from an online climate API.
Cleaning Up
Removing incorrect entries, fixing typos, and handling empty fields.
Example: Deleting rows with null transaction IDs.
Rescaling Values
Adjusting numeric figures to a common scale to avoid uneven influence.
Example: Transforming income from ₹100–₹10,000 to a 0–1 range.
Rewriting Categories
Converting text labels into machine-readable formats.
Example: “Low”, “Medium”, “High” → 0, 1, 2 for risk levels.
Dividing the Dataset
Breaking up the full data into separate portions for learning and verifying.
Example: Use 70% for training and the rest for testing the outcome.
Fixing Imbalance
Adjusting uneven label distributions to avoid bias in learning.
Example: Oversampling “yes” responses in a feedback dataset.
Randomizing Order
Shuffling data points to eliminate patterns from arrangement.
Example: Mixing rows of survey results before feeding into the model.
Creating More Variations
Enhancing datasets with artificial but realistic changes.
Example: Rotating images of faces to increase variety.
Filling Gaps
Estimating and inserting likely values in missing spaces.
Example: Using the average age when someone's age is missing.
Trimming Extras
Eliminating columns or values that add no real insight.
Example: Removing a serial number column from training data.
Prefer Learning by Watching?
Watch these YouTube tutorials to understand CYBERSECURITY Tutorial visually:
What You'll Learn:
- 📌 How is data prepared for machine learning?
- 📌 What is Data Preparation?