Practical LLM Evaluation for Production Systems: Measure, monitor, and improve AI system reliability across training and inferenceIndrajit Kar0