Reading Eval Numbers: the PM's Skeptic Kit for Quality Reports
Your engineering team brings you a number. 87 percent. Is that good? Is it real? Is it better than last week? Is it the number you should care about? Here is the skeptic kit for reading any AI quality report without getting fooled.