I've spent 12 years building t automation systems — and the last 8 months validating AI. I test LLMs for hallucination, agentic pipelines for reliability, and RAG systems for accuracy.