Discussion about this post

User's avatar
Alex's avatar

The definition of reliability in this post is about 70% reliable :)

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

The original METR graph reports task success which has very different implications from reliability.

Managing tail risk / TVaR is a real problem but the proposed solutions and conclusions are problematic because they mix up model reliability with tail performance.

Expand full comment
Hasan Salim Kanmaz's avatar

Please distinguish between two groups: AI practitioners and AI influencers who follow or generate hype.

As a seasoned AI practitioner, I can’t agree with the title “The AI industry worships at the altar of Accuracy — but humanity answers to a more fickle, demanding god: Reliability.” This is ObviouslyWrong :)

It is a well-known fact that accuracy is not the best metric. That’s precisely why many other evaluation metrics have been defined and used in model development and training.

Expand full comment
4 more comments...

No posts

Ready for more?