Ch 1 — What AI Actually Is

How the systems work under the hood — explained through operations and process analogies, not code
Under the Hood
database
Data In
arrow_forward
model_training
Training
arrow_forward
tune
Patterns
arrow_forward
output
Predictions
arrow_forward
fact_check
Evaluation
arrow_forward
warning
Failure Modes
-
Click play or press Space to begin the deep dive...
Step- / 8
database
It All Starts With Data
What AI actually learns from and why data quality is everything
The Process Analogy
Imagine you’re onboarding a new HR analyst. You give them a binder of the last 3 years of hiring decisions: every resume received, who got interviews, who got offers, who accepted, and how they performed after 1 year. You say “study these and figure out what predicts a good hire.” That binder is training data. The analyst studying it is training a model. The patterns they find are the model’s learned weights.
What Makes Training Data Good or Bad
Volume: More examples = more reliable patterns. 50 resumes isn’t enough. 50,000 might be.
Representativeness: If your data only includes hires from one region, the model won’t generalize to others.
Label quality: “Good hire” means what? Stayed 2 years? Got promoted? Your definition shapes what the model optimizes for.
Freshness: 2019 hiring patterns may not predict 2026 success.
Data Quality Red Flags for HR
RED FLAG: Training on historical hiring decisions // If you historically favored certain schools, // the model learns that bias as "truth" RED FLAG: Small dataset with big claims // "Our AI analyzed 200 resumes" is not // enough data to learn reliable patterns RED FLAG: No disclosure of training data // If a vendor won't tell you what data // their model learned from, walk away GREEN FLAG: Regular retraining // Models retrained on recent data stay // aligned with current reality GREEN FLAG: Diverse, representative data // Vendor can show data demographics // match or exceed your workforce diversity
HRIS connection: Your data quality in Workday, BambooHR, or whatever system you run directly affects what any AI built on it can do. Garbage in, garbage out isn’t just a cliché — it’s a compliance risk.
model_training
How a Model Actually Learns
The training loop explained as a process, not math
The Feedback Loop
Training an AI model is a continuous improvement process — something you already understand from ops. Here’s how it works:

1. Make a guess: The model looks at a data point and makes a prediction.
2. Check the answer: Compare the prediction to the known correct answer.
3. Measure the error: How far off was the prediction?
4. Adjust: Tweak the model slightly to be less wrong next time.
5. Repeat: Do this millions of times across all the training data.

That’s literally it. The model gets a little better with each cycle, eventually finding patterns that reliably predict outcomes.
Ops Analogy: Quality Control
Think of it like calibrating a new employee’s judgment. They review a benefits dispute, make a call, you tell them whether they got it right, they adjust their approach. After reviewing 500 disputes, they’ve developed reliable instincts. That’s training. After that, when they see a new dispute they’ve never seen before, they apply those learned patterns. That’s inference.
Key Concepts
Training = Learning phase (expensive, done once or periodically) Inference = Prediction phase (cheap, done every time you use it) Overfitting = Model memorized the training data // Like an employee who can only handle // cases identical to ones they've seen Underfitting = Model didn't learn enough // Like an employee who guesses randomly
Vendor question: “How often do you retrain your model?” If the answer is “never” or “we trained it once in 2023,” the model is making predictions based on outdated patterns. Your workforce isn’t static — the model shouldn’t be either.
smart_toy
How LLMs Work Differently
Why ChatGPT feels like magic and where the magic breaks down
Next-Word Prediction at Scale
LLMs were trained on essentially the entire public internet — books, articles, Wikipedia, forums, code. The training task was deceptively simple: given some text, predict the next word. But doing this well at scale requires the model to develop an internal understanding of grammar, facts, logic, tone, and context. The result is a system that can generate coherent, contextual text on virtually any topic.
Why They Sound So Confident
An LLM doesn’t “know” things the way you do. It has learned statistical patterns about what text tends to follow what other text. When it writes something factually wrong, it’s not “lying” — it’s generating the most statistically likely continuation. Confident tone is a feature of the training, not evidence of accuracy.
The Process Flow
Your prompt: "Draft an offer letter for a Senior HRBP in California at $145K base" What the LLM does: 1. Breaks your text into tokens (word pieces) 2. Processes all tokens simultaneously // This is the "transformer" architecture 3. For each position, predicts the most likely next token given all context 4. Generates text one token at a time 5. Applies safety filters and formatting What it does NOT do: × Look up California employment law × Verify $145K is market-competitive × Check your company's offer template × Know your benefits package
Critical for HR: An LLM generating offer letters or policy language can produce text that sounds legally correct but contains errors. Always have a human (ideally legal) review AI-generated employment documents. The confident tone makes errors harder to catch, not easier.
output
How Predictions Become Decisions
The gap between "the model says" and "we decided"
Scores, Not Decisions
Most HR AI models output a score or probability, not a decision. A resume screening model might say “78% match” for one candidate and “42% match” for another. But someone has to decide what to do with that score. Where do you set the cutoff? 70%? 80%? That’s a human decision with legal implications — and it’s the decision that determines adverse impact, not the model itself.
The Decision Architecture
Human-in-the-loop: AI recommends, human decides. Safest for regulated decisions.
Human-on-the-loop: AI decides, human can override. Faster but riskier.
Fully automated: AI decides with no human review. Only appropriate for low-stakes, non-regulated actions.
Mapping Risk to Oversight Level
LOW RISK — Can automate Scheduling interview slots Sending reminder emails Categorizing help desk tickets MEDIUM RISK — Human-on-the-loop Resume ranking (human reviews top tier) Survey sentiment categorization Benefits recommendation engine HIGH RISK — Human-in-the-loop required Hiring/rejection decisions Performance rating inputs Compensation recommendations Termination risk flagging Promotion candidate ranking
Your call: This risk mapping is an operations decision, not a technology decision. You define the policy, the approval chains, and the audit requirements. The AI team provides the tool; you provide the governance.
fact_check
How to Evaluate Whether It’s Working
Metrics that matter for HR, explained without statistics jargon
Accuracy Isn’t Enough
A vendor says their resume screener is “95% accurate.” Sounds great, right? But imagine 95% of applicants are unqualified. A model that rejects everyone would be 95% accurate. What you actually need to know:

False positives: How often does it flag unqualified candidates as qualified? (Costs you interview time)
False negatives: How often does it reject qualified candidates? (Costs you talent — and may create adverse impact)

In HR, false negatives are usually the bigger legal risk because they’re where disparate impact hides.
Questions to Ask Vendors
Ask: "What's your false negative rate broken down by demographic group?" // If they can't answer this, their model // hasn't been audited for bias Ask: "How do you define 'successful' in your training data?" // Their definition shapes everything // the model optimizes for Ask: "What's the four-fifths rate across protected classes?" // EEOC's four-fifths rule: selection rate // for any group should be at least 80% // of the rate for the highest group Ask: "Can we see the model card?" // A document describing the model's // training, capabilities, and limitations // Good vendors publish these
Power move: Asking for a model card separates serious vendors from marketers. If they have one, they’ve done the work. If they don’t know what you’re talking about, their “AI” may not be what they claim.
report
The Hallucination Problem in Detail
Why AI makes things up and what that means for HR documents
Why Hallucinations Happen
LLMs generate text by predicting the most likely next word. When they encounter a topic where their training data is thin, ambiguous, or contradictory, they fill in the gap with plausible-sounding text. They don’t have an internal “I don’t know” signal. This is a fundamental architectural feature, not a bug that will be patched — though newer models hallucinate less often.
HR-Specific Hallucination Risks
Legal citations: An LLM might cite a law or regulation that doesn’t exist, or state an incorrect threshold for a real law.
Policy specifics: It might generate leave policies with details that sound right but don’t match your state’s requirements.
Benefits details: It might state coverage amounts, deductibles, or eligibility rules that are plausible but wrong.
Precedent: It might reference a court case that never happened.
Mitigation Strategies
Risky
Using an LLM to draft FMLA guidance without human legal review. Using AI-generated policy language in an employee handbook without verification. Letting a chatbot answer benefits questions without a knowledge base constraint.
Safer
Using an LLM as a first draft tool with mandatory human review. Using RAG (retrieval-augmented generation) so the AI consults your actual documents. Setting up guardrails that prevent the AI from answering outside its verified knowledge.
Non-negotiable: Any AI system that touches employment law, benefits, or compliance must have human review in the loop. No exceptions. The cost of a hallucinated legal claim is orders of magnitude higher than the cost of a human reviewer.
balance
The Bias Pipeline
Where bias enters, how it compounds, and what you can actually do about it
Bias Entry Points
Bias doesn’t come from one place — it accumulates through a pipeline:

1. Historical data: If your past hiring favored certain demographics, that’s baked into the data.
2. Label definitions: If “high performer” correlates with who gets face time with managers (often biased by proximity, not quality), the model learns that.
3. Feature selection: If the model uses zip code as a feature, it’s using a proxy for race.
4. Threshold setting: Where you set the cutoff score can create disparate impact.
5. Deployment context: A model trained on one population may not work for another.
Real-World HR AI Bias Cases
Amazon (2018): Internal hiring AI learned to penalize resumes containing the word "women's" (e.g., "women's chess club captain") // Trained on 10 years of hiring data in a // male-dominated tech workforce HireVue (2019-2021): Video interview AI scored candidates on facial expressions and tone // Criticized for potential disability and // racial bias; dropped facial analysis in 2021 iTutorGroup (2023): EEOC settlement over AI that automatically rejected applicants over 55 // $365,000 settlement, first EEOC case // specifically targeting AI hiring bias
Your role: You don’t need to audit the algorithm yourself. But you need to require that it be audited, know what an adequate audit looks like, and know what to do when bias is found. We cover this in depth in Chapter 7 (Compliance & Risk).
hub
Putting It All Together
A mental model for evaluating any AI claim
Your Evaluation Framework
When any vendor, consultant, or executive tells you about an AI capability, run it through this checklist:

1. What data did it learn from? (Training data quality)
2. What is it actually predicting? (Not what they say — what the model optimizes for)
3. How often is it wrong? (Error rates by demographic group)
4. Who reviews the output? (Human oversight architecture)
5. Can it explain why? (Explainability for audit purposes)
6. What happens when it fails? (Fallback process, escalation path)
Chapter 1 Deep Dive Summary
DATA: AI learns from examples. Your data quality directly shapes AI capability and risk. TRAINING: It's a feedback loop — guess, check, adjust, repeat. Ask how often models retrain. LLMs: Predict next words, not truth. Always hallucinate. Review all generated content. DECISIONS: Models output scores, not decisions. You control the governance architecture. EVALUATION: "95% accurate" is meaningless. Ask for error rates by demographic group. BIAS: Enters at every stage of the pipeline. Require audits. Know the legal precedents.
Next up: Chapter 2 maps exactly where AI lives in the HR tech stack today — your ATS, HRIS, payroll, benefits, and learning platforms. You’ll learn what’s real, what’s marketing, and what questions to ask your current vendors.