RAG Architecture for HR
Retrieval-Augmented Generation (RAG) is the architecture that makes HR chatbots accurate. Instead of relying on what the LLM “remembers” from training, RAG first searches your actual documents — benefits guides, policy handbooks, PTO rules — and then generates an answer grounded in that content. The quality of the AI’s answers depends almost entirely on the quality of what you put into the knowledge base.
Chunking Strategies
Documents need to be broken into retrievable chunks. Too large and retrieval is imprecise. Too small and context is lost.
By section: Split on headers. Good for structured handbooks.
By paragraph: Good for policy documents with clear paragraphs.
Overlapping windows: 500-word chunks with 100-word overlap. Preserves context at boundaries.
Semantic: AI determines natural breakpoints. Best quality, most complex.
The RAG Pipeline
INDEXING PHASE (done once, updated periodically)
1. Collect: benefits guide, handbook, PTO policy,
leave policies, 401k docs, org charts...
2. Chunk: Split into ~500 word segments
3. Embed: Convert each chunk to a vector
// embedding model maps text → numbers
// similar content → similar vectors
4. Store: Save vectors in a vector database
// Pinecone, Weaviate, pgvector, etc.
QUERY PHASE (every employee question)
1. Employee asks: "What's the dental copay?"
2. Embed the question into same vector space
3. Find top 3-5 most similar document chunks
4. Send to LLM: "Answer using ONLY this context"
5. LLM generates answer grounded in your docs
6. Cite sources so employee can verify
The 80/20 rule: 80% of chatbot accuracy depends on the knowledge base, not the AI model. A mediocre model with a great knowledge base outperforms a great model with a bad knowledge base every time. Invest in document quality, freshness, and coverage first.