← All Sessions

Data Quality
Through Building

AI Trade School - Session 1B

Why Garbage In = Garbage Out

🔄 Welcome Back

Last Week You Built

✍️

5 AI Prompts

Working prompts you can use immediately

📁

Portfolio Piece 1

Your AI Prompt Portfolio

🧠

Understanding

What AI is: patterns from data

🎤

Confidence

You can work with AI TODAY

Why Do Some AI Systems
Work Great?

And Others Fail?

The Answer: Data Quality

🚨 You've Experienced This

📱

Autocorrect Fails

Changes your meaning completely

🛍️

Bad Recommendations

Amazon suggests stuff you don't want

🤖

Wrong Information

Chatbot gives completely wrong answers

System Failures

AI produces nonsensical output

Every single one? Bad data.

⚠️ Real Story: Healthcare AI

2019: Hospital AI Misdiagnosis

A hospital deployed an AI system to predict which patients needed intensive care. The AI failed dangerously.

Why? The training data had a critical flaw: It only included patients who had already been admitted to the ICU.

The AI learned to predict "who gets admitted" instead of "who NEEDS admission."

Result: The AI missed high-risk patients because the data quality was wrong.

Lesson: Garbage in = Garbage out. No matter how sophisticated the AI.

⚠️ Real Story: Amazon Hiring AI

Amazon's Biased Hiring AI

Amazon built an AI to screen job applicants. Within a year, they shut it down.

Why? The training data was 10 years of resumes—mostly from male applicants because tech was male-dominated.

The AI learned to penalize resumes that included the word "women's" (as in "women's chess club captain").

Result: The AI was biased because the training data reflected past bias.

Lesson: AI doesn't fix human problems—it amplifies patterns in the data.

⚠️ Real Story: Retail Inventory

Retail Chain Inventory Disaster

A retail chain used AI to predict inventory needs. First month? Disaster. Stores ran out of popular items, over-ordered unpopular ones.

Why? The data had:

• Duplicate orders (same customer, multiple entries)

• Missing values (no dates on 20% of transactions)

• Inconsistent product names ('iPhone 13' vs 'iphone 13' vs 'Apple iPhone 13')

Result: The AI couldn't predict accurately because it was learning from chaos.

Lesson: Clean your data BEFORE you train AI.

You Can Have The Most
Sophisticated AI In The World

But If Your Data Is Messy
Your AI Will Fail

This Is The Foundational Skill

🎯 Five Data Quality Issues

You'll See These Everywhere

Issue #1: Missing Data

What Is It?

Empty cells, null values, blanks where data should be.

Examples: Empty email, missing phone number, blank purchase date

Why It Matters

AI can't learn from nothing. Missing data creates blind spots in the pattern recognition.

How to Fix

Delete the row, fill with placeholder, or estimate based on similar records.

Impact: Prevents incomplete pattern learning

Issue #2: Duplicates

What Is It?

Same record appears multiple times in the dataset.

Examples: Order 1001 listed twice, John Smith with identical data in two rows

Why It Matters

AI thinks one customer = two customers. Predictions get skewed and inaccurate.

How to Fix

Remove duplicate rows using Excel's 'Remove Duplicates' feature or equivalent.

Impact: Prevents double-counting and biased predictions

Issue #3: Inconsistency

What Is It?

Same thing recorded in different ways.

Examples: "iPhone 13" vs "iphone 13" vs "Apple iPhone 13" vs "iPhone-13"

Why It Matters

AI treats these as four different products when they're the same. This destroys pattern recognition.

How to Fix

Standardize format using Find & Replace or text cleaning functions.

Impact: Allows AI to properly group and analyze related data

Issue #4: Outliers

What Is It?

Values that don't make sense or are extreme errors.

Examples: Age = 450 (impossible), Purchase = $9,999,999 (unrealistic)

Why It Matters

AI learns from extremes. One weird value can ruin predictions for the entire dataset.

How to Fix

Investigate why it's there. If it's an error, correct it or remove it.

Impact: Prevents skewed AI learning from impossible values

Issue #5: Wrong Format

What Is It?

Data type inconsistency: dates, currency, numbers all over the place.

Examples: "01/15/2024" vs "15-Jan-24" vs "Jan 15" or "$50.00" vs "50" vs "$50"

Why It Matters

AI needs consistent formats to calculate and compare values accurately.

How to Fix

Convert all data to one consistent format (same date style, same currency format).

Impact: Enables accurate calculations and comparisons

Three Phases. Three Hours.

1
EXPLORE
20 Minutes
2
CLEAN
40 Minutes
3
ANALYZE
30 Minutes

📊 Phase 1: Explore & Identify

20 Minutes

Your Job

FIND problems. Don't fix yet. Just identify.

What to Do

Scan every column. Mark every issue. Categorize each one.

Pro Tips

Sort columns. Use Find to spot blanks. Look for weird values.

Target

Find 5-10 issues. More is better!

🔧 Phase 2: Clean & Document

40 Minutes

The Process

Identify → Fix → Document

Key Step

Screenshot BEFORE and AFTER every fix

Minimum

Fix at least 5 issues. More is better!

Document

What you fixed, how, and why it matters

💭 Phase 3: Analyze & Reflect

30 Minutes

Count Issues

How many did you find? How many fixed?

Calculate

What % of rows had problems?

Summarize

Write 1 paragraph reflecting on what you learned

Reflect

Why would messy data break an AI system?

🏆 Portfolio Piece #2

Data Quality Analysis Report

Before/After

Screenshots showing 5+ fixes

Documentation

What you fixed and why it matters

Analysis

Summary of findings and insights

Value

Worth $60K-$75K starting salary

👉 This Is Professional Work

Data Analysts Do This Every Day

You're Not Learning

You're DOING

You're Not Practicing

You're Demonstrating Real Skill

Employers See This

"I can work with messy real-world data"

Salary Implication

$60K+ Data Analyst skill

🎯 What You'll Do Today

1️⃣

Learn

Why data quality makes or breaks AI

2️⃣

Analyze

Real messy data like at work

3️⃣

Fix

At least 5 data quality issues

4️⃣

Document

Your work professionally

That's not classroom work. That's professional Data Analyst work.

Session 1A: How to USE AI

Session 1B: How to PREPARE DATA for AI

Together: The Complete Picture

You understand AI from both sides

📚 Next Week: We Go Deeper

Build Your First Machine Learning Model

Not just USE AI. BUILD it.

Keep That Data Analysis Document
Show It To Anyone Who Says "AI is Magic"

Every AI Failure
You Hear About?

Someone Didn't Clean
Their Data

You Now Know How to Prevent That

Real Data Is
Always Messy

You Just Learned How to Fix It

That's a Professional Skill

Questions?
1 / 25
Title
Welcome Back
Today's Question
Real Problems
Healthcare Failure
Amazon Bias
Retail Inventory
The Reality
5 Issues Overview
Issue 1: Missing
Issue 2: Duplicates
Issue 3: Inconsistency
Issue 4: Outliers
Issue 5: Format
3 Phases
Phase 1: Explore
Phase 2: Clean
Phase 3: Analyze
Portfolio Piece 2
Professional Work
Today's Achievement
The Big Picture
Next Week
Final Lesson
Final Message
/ Space next
back
F fullscreen