TutorialsFrance
OpenReproduce ITBench‑AA SRE Evaluations and Produce Audit‑Ready JSON Reports
Reproducible tutorial to run ITBench‑AA's SRE tasks and emit audit‑ready JSON reports (accuracy, avg_turns, false_positive_rate, task_count). Frontier models scored below 50%.