I am currently reading Daniel Kahneman’s book “Thinking, fast and slow,” which summarizes a truly amazing amount of studies. Among many other cognitive biases, Kahneman explains that it is difficult for people to accept that often algorithms based on statistical data produce better predictions than experts. This is difficult to accept even when one is shown evidence that the algorithm is better. He cites many examples for that, among them forecasting the future success of military personnel, quality of wine, or treatment of patients.
The reason, Kahneman explains, is that humans are not as efficient screening and aggregating data as software. Humans are prone to miss details, especially if the data is noisy, they get tired or fall for various cognitive biases in their interpretation of data. Generally, the human brain does not effortlessly engage in Bayesian inference. In combination with it trying to save energy and effort, this leads to mistakes. Humans are especially bad in making summary judgements of complex information, Kahneman writes, while at the same time being overly confident about the accuracy of their judgement. One of his examples is: “Experienced radiologists who evaluate chest X-rays as “normal or “abnormal” contradict themselves 20% of the time when they see the same picture on separate occasions.”
Interestingly however, Kahneman also cites evidence that expert intuition can be very valuable, provided the expert’s judgement is about a situation where learning from experience is possible. (Expert judgement is an illusion when a data series is entirely uncorrelated.) He thus suggests that judgements should be based on an analysis of statistical data from past performance, combined with expert intuition. We should overcome our disliking of statistical measures, he writes “to maximize predictive accuracy, final decisions should be left to formulas, especially in low-validity environments” (when prediction is difficult due to a large amount of relevant factors).
This made me question my own objections to using measures for scientific success, as scientific success is of the type of prediction that is very difficult to make because luck plays a big role. Part of my disliking arguably stems from a general unease of leaving decisions about people’s future to a computer. While that is the case, and probably part of the reason I don’t like the idea, it’s not the actual problem I have belabored in my earlier blogposts. For me the main problem with using measures for scientific success is that I’d like to see evidence they are actually working, and do not adversely affect research. I am worried particularly that a widely used measure for scientific success would literally redefine what we mean by success in the first place. A small mistake, implemented and streamlined globally, could in this way dramatically slow down progress.
But I am wondering now if not, based on what Kahneman writes, I have to conclude that in addition to asking for letters of recommendation (the “expert’s intuiton”) it would be valuable to judge researchers’ past performance on a point scale. Consider that you’d be asked to fill out a questionnaire for each of your students and postdocs, ranking him or her from 0 to 5 for those characteristics typically named in letters: technical skills, independence, creativity, and so on, and also add your confidence on these judgements. You could update your scores if your opinion changes. What a hiring committee would do with these scores is a different question entirely.
The benefit of this would be the assembly of a data base needed to discover predictors for future performance, if they exist. The difficulty is that the experts in question are rarely offering a neutral judgement; many have a personal interest in seeing their students succeed, so there needs to be some incentive for accuracy. The risk would be that such a predictor might become a self-fulfilling prophecy. At least until a reality check documents that actually, despite all the honors, prices and awards, very little has happened in terms of actual progress.
Either way, now that I think about it, such a ranking would be temptingly useful for hiring committees to sort through large numbers of applicants quickly. I wouldn’t be surprised if somebody tries this rather sooner or later. Would you welcome it?