How It Works
Pay equity analysis uses multivariate regression to isolate unexplained pay differences. The process:
1. Define the outcome: Total compensation (or base salary, or total cash comp — each tells a different story)
2. Control for legitimate factors: Job level, job family, tenure, location, performance rating, relevant experience. These are factors that should drive pay differences.
3. Test for demographic differences: After controlling for legitimate factors, is there a statistically significant difference in pay by gender, race, age, or other protected class?
4. Assess significance: A 1.5% gap might be statistically significant in a large population but not in a small one. Both practical and statistical significance matter.
Legal vs. Statistical Standards
Statistical standard:
p-value < 0.05 (95% confidence)
// "There's less than 5% probability
// this gap occurred by chance"
Legal standard (US):
"Similarly situated" employees
// Courts define comparator groups
// differently than statisticians
The gap between them:
Statistical: 3.2% gap, p=0.003
// Clearly statistically significant
Legal question: Are these employees
truly "similarly situated"?
// Did the model control for all
// legitimate factors the court would
// recognize? Did it over-control
// for factors that are themselves
// influenced by discrimination?
Warning: Controlling for "job level" can
mask discrimination in promotion decisions.
// If women are promoted less, controlling
// for level hides the gap at its source
Critical nuance: Pay equity analysis requires legal counsel, not just data science. Over-controlling (adding too many variables) can mask real discrimination. Under-controlling can show gaps that have legitimate explanations. The right set of control variables is a legal and business judgment, not a purely statistical one.