This lab turns AI model evaluation drift into a practical release-governance exercise. Students compare two evaluation snapshots, decide whether a deployment should continue, and document the evidence needed before automation can safely resume.
What students can learn
- How small evaluation changes can create real user-facing release risk.
- How to separate model-quality signals from pipeline, dataset, or prompt changes.
- How CI/CD gates should react when AI behaviour becomes unstable.
- How to communicate an evidence-based rollback, hold, or canary decision.
Recommended classroom use
- Give each group the scenario and ask them to classify the release risk within 10 minutes.
- Ask groups to map each suspected cause to one verification step and one owner.
- Compare proposed CI/CD gate changes, then discuss which controls are useful without over-blocking future releases.
- Close with a short incident memo: decision, evidence, remaining uncertainty, and next release condition.
Extension activity
Ask students to convert the response plan into a lightweight GitHub Actions checklist: evaluation threshold, artifact retention, approval note, rollback trigger, and post-release observation window.