AI Evaluation Drift Response Lab

This lab turns AI model evaluation drift into a practical release-governance exercise. Students compare two evaluation snapshots, decide whether a deployment should continue, and document the evidence needed before automation can safely resume.

What students can learn

How small evaluation changes can create real user-facing release risk.
How to separate model-quality signals from pipeline, dataset, or prompt changes.
How CI/CD gates should react when AI behaviour becomes unstable.
How to communicate an evidence-based rollback, hold, or canary decision.

Recommended classroom use

Give each group the scenario and ask them to classify the release risk within 10 minutes.
Ask groups to map each suspected cause to one verification step and one owner.
Compare proposed CI/CD gate changes, then discuss which controls are useful without over-blocking future releases.
Close with a short incident memo: decision, evidence, remaining uncertainty, and next release condition.

Extension activity

Ask students to convert the response plan into a lightweight GitHub Actions checklist: evaluation threshold, artifact retention, approval note, rollback trigger, and post-release observation window.