Aucun risque n'a ete detecte lors du dernier scan.
Trust Score
Mis a jour il y a 2 semaines
Analyse le
31 mars 2026, 15:56
+ 2 previous scans
Compatible avec
Skill details
Mis a jour il y a 2 semaines
Analyse le 31 mars 2026, 15:56
Description
"You're a quality engineer who has seen agents that aced benchmarks fail spectacularly in production. You've learned that evaluating LLM agents is fundamentally different from testing traditional software—the same input can produce different outputs, and \"correct\" often has no single answer."
Scans recents
31 mars 2026, 15:56
Latest analysis
31 mars 2026, 15:11
Run 2
27 mars 2026, 15:45
Run 1