Ch 1: What AI Actually Is — Under the Hood

Ch 1 — What AI Actually Is

How the systems work under the hood — explained through operations and process analogies, not code

Index ← High Level

Under the Hood

database

Data In

arrow_forward

model_training

Training

arrow_forward

tune

Patterns

arrow_forward

output

Predictions

arrow_forward

fact_check

Evaluation

arrow_forward

warning

Failure Modes

Click play or press Space to begin the deep dive...

Step- / 8

database

It All Starts With Data

What AI actually learns from and why data quality is everything

The Process Analogy

Imagine you’re onboarding a new HR analyst. You give them a binder of the last 3 years of hiring decisions: every resume received, who got interviews, who got offers, who accepted, and how they performed after 1 year. You say “study these and figure out what predicts a good hire.” That binder is training data. The analyst studying it is training a model. The patterns they find are the model’s learned weights.

What Makes Training Data Good or Bad

Volume: More examples = more reliable patterns. 50 resumes isn’t enough. 50,000 might be.
Representativeness: If your data only includes hires from one region, the model won’t generalize to others.
Label quality: “Good hire” means what? Stayed 2 years? Got promoted? Your definition shapes what the model optimizes for.
Freshness: 2019 hiring patterns may not predict 2026 success.

Data Quality Red Flags for HR

RED FLAG: Training on historical hiring decisions // If you historically favored certain schools, // the model learns that bias as "truth" RED FLAG: Small dataset with big claims // "Our AI analyzed 200 resumes" is not // enough data to learn reliable patterns RED FLAG: No disclosure of training data // If a vendor won't tell you what data // their model learned from, walk away GREEN FLAG: Regular retraining // Models retrained on recent data stay // aligned with current reality GREEN FLAG: Diverse, representative data // Vendor can show data demographics // match or exceed your workforce diversity

HRIS connection: Your data quality in Workday, BambooHR, or whatever system you run directly affects what any AI built on it can do. Garbage in, garbage out isn’t just a cliché — it’s a compliance risk.

model_training

How a Model Actually Learns

The training loop explained as a process, not math

The Feedback Loop

Training an AI model is a continuous improvement process — something you already understand from ops. Here’s how it works:

1. Make a guess: The model looks at a data point and makes a prediction.
2. Check the answer: Compare the prediction to the known correct answer.
3. Measure the error: How far off was the prediction?
4. Adjust: Tweak the model slightly to be less wrong next time.
5. Repeat: Do this millions of times across all the training data.

That’s literally it. The model gets a little better with each cycle, eventually finding patterns that reliably predict outcomes.

Ops Analogy: Quality Control

Think of it like calibrating a new employee’s judgment. They review a benefits dispute, make a call, you tell them whether they got it right, they adjust their approach. After reviewing 500 disputes, they’ve developed reliable instincts. That’s training. After that, when they see a new dispute they’ve never seen before, they apply those learned patterns. That’s inference.

Key Concepts

Training = Learning phase (expensive, done once or periodically) Inference = Prediction phase (cheap, done every time you use it) Overfitting = Model memorized the training data // Like an employee who can only handle // cases identical to ones they've seen Underfitting = Model didn't learn enough // Like an employee who guesses randomly

Vendor question: “How often do you retrain your model?” If the answer is “never” or “we trained it once in 2023,” the model is making predictions based on outdated patterns. Your workforce isn’t static — the model shouldn’t be either.

smart_toy

How LLMs Work Differently

Why ChatGPT feels like magic and where the magic breaks down

Next-Word Prediction at Scale

LLMs were trained on essentially the entire public internet — books, articles, Wikipedia, forums, code. The training task was deceptively simple: given some text, predict the next word. But doing this well at scale requires the model to develop an internal understanding of grammar, facts, logic, tone, and context. The result is a system that can generate coherent, contextual text on virtually any topic.

Why They Sound So Confident

An LLM doesn’t “know” things the way you do. It has learned statistical patterns about what text tends to follow what other text. When it writes something factually wrong, it’s not “lying” — it’s generating the most statistically likely continuation. Confident tone is a feature of the training, not evidence of accuracy.

The Process Flow

Your prompt: "Draft an offer letter for a Senior HRBP in California at $145K base" What the LLM does: 1. Breaks your text into tokens (word pieces) 2. Processes all tokens simultaneously // This is the "transformer" architecture 3. For each position, predicts the most likely next token given all context 4. Generates text one token at a time 5. Applies safety filters and formatting What it does NOT do: × Look up California employment law × Verify $145K is market-competitive × Check your company's offer template × Know your benefits package

Critical for HR: An LLM generating offer letters or policy language can produce text that sounds legally correct but contains errors. Always have a human (ideally legal) review AI-generated employment documents. The confident tone makes errors harder to catch, not easier.

output

How Predictions Become Decisions

The gap between "the model says" and "we decided"

Scores, Not Decisions

Most HR AI models output a score or probability, not a decision. A resume screening model might say “78% match” for one candidate and “42% match” for another. But someone has to decide what to do with that score. Where do you set the cutoff? 70%? 80%? That’s a human decision with legal implications — and it’s the decision that determines adverse impact, not the model itself.

The Decision Architecture

Human-in-the-loop: AI recommends, human decides. Safest for regulated decisions.
Human-on-the-loop: AI decides, human can override. Faster but riskier.
Fully automated: AI decides with no human review. Only appropriate for low-stakes, non-regulated actions.

Mapping Risk to Oversight Level

LOW RISK — Can automate Scheduling interview slots Sending reminder emails Categorizing help desk tickets MEDIUM RISK — Human-on-the-loop Resume ranking (human reviews top tier) Survey sentiment categorization Benefits recommendation engine HIGH RISK — Human-in-the-loop required Hiring/rejection decisions Performance rating inputs Compensation recommendations Termination risk flagging Promotion candidate ranking

Your call: This risk mapping is an operations decision, not a technology decision. You define the policy, the approval chains, and the audit requirements. The AI team provides the tool; you provide the governance.

fact_check

How to Evaluate Whether It’s Working

Metrics that matter for HR, explained without statistics jargon

Accuracy Isn’t Enough

A vendor says their resume screener is “95% accurate.” Sounds great, right? But imagine 95% of applicants are unqualified. A model that rejects everyone would be 95% accurate. What you actually need to know:

False positives: How often does it flag unqualified candidates as qualified? (Costs you interview time)
False negatives: How often does it reject qualified candidates? (Costs you talent — and may create adverse impact)

In HR, false negatives are usually the bigger legal risk because they’re where disparate impact hides.

Questions to Ask Vendors

Ask: "What's your false negative rate broken down by demographic group?" // If they can't answer this, their model // hasn't been audited for bias Ask: "How do you define 'successful' in your training data?" // Their definition shapes everything // the model optimizes for Ask: "What's the four-fifths rate across protected classes?" // EEOC's four-fifths rule: selection rate // for any group should be at least 80% // of the rate for the highest group Ask: "Can we see the model card?" // A document describing the model's // training, capabilities, and limitations // Good vendors publish these

Power move: Asking for a model card separates serious vendors from marketers. If they have one, they’ve done the work. If they don’t know what you’re talking about, their “AI” may not be what they claim.

report

The Hallucination Problem in Detail

Why AI makes things up and what that means for HR documents

Why Hallucinations Happen

LLMs generate text by predicting the most likely next word. When they encounter a topic where their training data is thin, ambiguous, or contradictory, they fill in the gap with plausible-sounding text. They don’t have an internal “I don’t know” signal. This is a fundamental architectural feature, not a bug that will be patched — though newer models hallucinate less often.

HR-Specific Hallucination Risks

Legal citations: An LLM might cite a law or regulation that doesn’t exist, or state an incorrect threshold for a real law.
Policy specifics: It might generate leave policies with details that sound right but don’t match your state’s requirements.
Benefits details: It might state coverage amounts, deductibles, or eligibility rules that are plausible but wrong.
Precedent: It might reference a court case that never happened.

Mitigation Strategies

Risky

Using an LLM to draft FMLA guidance without human legal review. Using AI-generated policy language in an employee handbook without verification. Letting a chatbot answer benefits questions without a knowledge base constraint.

Safer

Using an LLM as a first draft tool with mandatory human review. Using RAG (retrieval-augmented generation) so the AI consults your actual documents. Setting up guardrails that prevent the AI from answering outside its verified knowledge.

Non-negotiable: Any AI system that touches employment law, benefits, or compliance must have human review in the loop. No exceptions. The cost of a hallucinated legal claim is orders of magnitude higher than the cost of a human reviewer.

balance

The Bias Pipeline

Where bias enters, how it compounds, and what you can actually do about it

Bias Entry Points

Bias doesn’t come from one place — it accumulates through a pipeline:

1. Historical data: If your past hiring favored certain demographics, that’s baked into the data.
2. Label definitions: If “high performer” correlates with who gets face time with managers (often biased by proximity, not quality), the model learns that.
3. Feature selection: If the model uses zip code as a feature, it’s using a proxy for race.
4. Threshold setting: Where you set the cutoff score can create disparate impact.
5. Deployment context: A model trained on one population may not work for another.

Real-World HR AI Bias Cases

Amazon (2018): Internal hiring AI learned to penalize resumes containing the word "women's" (e.g., "women's chess club captain") // Trained on 10 years of hiring data in a // male-dominated tech workforce HireVue (2019-2021): Video interview AI scored candidates on facial expressions and tone // Criticized for potential disability and // racial bias; dropped facial analysis in 2021 iTutorGroup (2023): EEOC settlement over AI that automatically rejected applicants over 55 // $365,000 settlement, first EEOC case // specifically targeting AI hiring bias

Your role: You don’t need to audit the algorithm yourself. But you need to require that it be audited, know what an adequate audit looks like, and know what to do when bias is found. We cover this in depth in Chapter 7 (Compliance & Risk).

hub

Putting It All Together

A mental model for evaluating any AI claim

Your Evaluation Framework

When any vendor, consultant, or executive tells you about an AI capability, run it through this checklist:

1. What data did it learn from? (Training data quality)
2. What is it actually predicting? (Not what they say — what the model optimizes for)
3. How often is it wrong? (Error rates by demographic group)
4. Who reviews the output? (Human oversight architecture)
5. Can it explain why? (Explainability for audit purposes)
6. What happens when it fails? (Fallback process, escalation path)

Chapter 1 Deep Dive Summary

DATA: AI learns from examples. Your data quality directly shapes AI capability and risk. TRAINING: It's a feedback loop — guess, check, adjust, repeat. Ask how often models retrain. LLMs: Predict next words, not truth. Always hallucinate. Review all generated content. DECISIONS: Models output scores, not decisions. You control the governance architecture. EVALUATION: "95% accurate" is meaningless. Ask for error rates by demographic group. BIAS: Enters at every stage of the pipeline. Require audits. Know the legal precedents.

Next up: Chapter 2 maps exactly where AI lives in the HR tech stack today — your ATS, HRIS, payroll, benefits, and learning platforms. You’ll learn what’s real, what’s marketing, and what questions to ask your current vendors.