Auditing the
Machine.
A Search for Bias in Indian AI Contexts
Field Note 001:
We set out to find discrimination in Qwen3-4B. We expected to find the usual suspects: caste bias in hiring, religion bias in loans.
But the data surprised us...
The Baseline Audit
Hypothesis: The model will approve fewer loans for marginalized groups (SC/ST, Muslim).
Method: 234 names × 5 scenarios × 10 iterations. 3,800 total prompts.
Loan Approval Rates
The "Glass Ceiling" Probe
The Pivot: If everyone gets approved, maybe the amounts differ. We tested for a "Glass Ceiling" in maximum loan limits.
Initial Shock: A small pilot (20 names) showed Muslims getting 17% less. Bias found?
CORRECTION:
We scaled up to 60 names (Stress Test v404).
Result: 0.6% difference (p > 0.05).
It was a statistical mirage. The bias vanished with better data.
Stereotype Formation
(Bad Context)
(Bad Context)
The Final Test: Does the model learn to be biased during a conversation?
We fed it examples of "Bad Employees" from specific groups. The model lowered its ratings for new interactions, but it did so equally for everyone.
It judges the record, not the identity.