agent-evaluation

Low signal level, contained permissions, and limited attack surface.

Top Tier

100/100

Recomendacoes

✅ No risks detected. This skill appears safe to use.

Riscos Detectados0

Nenhum risco foi detectado no ultimo scan.

Voir les risques detectes

Connectez-vous pour consulter l'analyse detaillee des risques.

$npx agentfend install cmn92pdvl00cju1ipk8kdm8cn

Trust Score

Top Tier

100trust

⭐ 27,8 mil🍴 4,7 mil

Updated há 2 semanas

Analisado

31 de mar. de 2026, 15:56

+ 2 previous scans

Compatível com

AGAntigravity

Skill details

Trust score

100/100

GitHub

Connected

Stars

27,8 mil

Forks

4,7 mil

Updated há 2 semanas

Analisado 31 de mar. de 2026, 15:56

Descricao

"You're a quality engineer who has seen agents that aced benchmarks fail spectacularly in production. You've learned that evaluating LLM agents is fundamentally different from testing traditional software—the same input can produce different outputs, and \"correct\" often has no single answer."

Ver fonte

Scans recentes

31 de mar. de 2026, 15:56

Latest analysis

31 de mar. de 2026, 15:11

Run 2

27 de mar. de 2026, 15:45

Run 1