Wednesday, August 01, 2012

Letter of recommendation 2.0

I am currently reading Daniel Kahneman’s book “Thinking, fast and slow,” which summarizes a truly amazing amount of studies. Among many other cognitive biases, Kahneman explains that it is difficult for people to accept that often algorithms based on statistical data produce better predictions than experts. This is difficult to accept even when one is shown evidence that the algorithm is better. He cites many examples for that, among them forecasting the future success of military personnel, quality of wine, or treatment of patients.

The reason, Kahneman explains, is that humans are not as efficient screening and aggregating data as software. Humans are prone to miss details, especially if the data is noisy, they get tired or fall for various cognitive biases in their interpretation of data. Generally, the human brain does not effortlessly engage in Bayesian inference. In combination with it trying to save energy and effort, this leads to mistakes. Humans are especially bad in making summary judgements of complex information, Kahneman writes, while at the same time being overly confident about the accuracy of their judgement. One of his examples is: “Experienced radiologists who evaluate chest X-rays as “normal or “abnormal” contradict themselves 20% of the time when they see the same picture on separate occasions.”

Interestingly however, Kahneman also cites evidence that expert intuition can be very valuable, provided the expert’s judgement is about a situation where learning from experience is possible. (Expert judgement is an illusion when a data series is entirely uncorrelated.) He thus suggests that judgements should be based on an analysis of statistical data from past performance, combined with expert intuition. We should overcome our disliking of statistical measures, he writes “to maximize predictive accuracy, final decisions should be left to formulas, especially in low-validity environments” (when prediction is difficult due to a large amount of relevant factors).

This made me question my own objections to using measures for scientific success, as scientific success is of the type of prediction that is very difficult to make because luck plays a big role. Part of my disliking arguably stems from a general unease of leaving decisions about people’s future to a computer. While that is the case, and probably part of the reason I don’t like the idea, it’s not the actual problem I have belabored in my earlier blogposts. For me the main problem with using measures for scientific success is that I’d like to see evidence they are actually working, and do not adversely affect research. I am worried particularly that a widely used measure for scientific success would literally redefine what we mean by success in the first place. A small mistake, implemented and streamlined globally, could in this way dramatically slow down progress.

But I am wondering now if not, based on what Kahneman writes, I have to conclude that in addition to asking for letters of recommendation (the “expert’s intuiton”) it would be valuable to judge researchers’ past performance on a point scale. Consider that you’d be asked to fill out a questionnaire for each of your students and postdocs, ranking him or her from 0 to 5 for those characteristics typically named in letters: technical skills, independence, creativity, and so on, and also add your confidence on these judgements. You could update your scores if your opinion changes. What a hiring committee would do with these scores is a different question entirely.

The benefit of this would be the assembly of a data base needed to discover predictors for future performance, if they exist. The difficulty is that the experts in question are rarely offering a neutral judgement; many have a personal interest in seeing their students succeed, so there needs to be some incentive for accuracy. The risk would be that such a predictor might become a self-fulfilling prophecy. At least until a reality check documents that actually, despite all the honors, prices and awards, very little has happened in terms of actual progress.

Either way, now that I think about it, such a ranking would be temptingly useful for hiring committees to sort through large numbers of applicants quickly. I wouldn’t be surprised if somebody tries this rather sooner or later. Would you welcome it?


  1. Of course, one should be hired because one is expected to perform well in the future; it is not a reward for past accomplishments. Assuming that one has good predictors, then it makes sense to use them only if they are somehow weighted with the difficulty of a person's situation. Often, the opposite happens: had prestigious fellowship---put him on the short list. On other words, people expect less of people who had good working conditions (the prestigious fellowship here) and more of people who didn't (comes from an unknown institute---let's really scrutinize him). Of course, perhaps someone had a prestigious fellowship based on being good in the past. Even if that is true (perhaps he was lucky, or had connections, or bribed, or blackmailed), then it doesn't matter at zeroth order for judging his performance during said fellowship. At first order, one should expect more of him, not less. Whoever is hired will have approximately the same working conditions, so if past performance is used as a proxy for future performance, one has to weight it with the difficulty of the situation. (Apart from not having a prestigious fellowship, one could have been ill, had children, been wrongly accused etc.) This is not some sort of sympathy for the candidate who has had a bad time, but sensible strategy if one wants to hire the best person. Often a vicious circle occurs: someone gets, for whatever reason, a good position early on, and this is the ticket to the next position, even though other candidates performed better under worse circumstances.

  2. What you are suggesting is mere quantifification of prejudice. That just creates an illusion of data.

    Far more valuable would be to study those measureable things that turn out to correlate with future productivity, whatever they turn out to be. Number and citation number of papers published before age 25, IQ, degree of myopia, color of hair or whatever.

  3. Hi CIP,

    Yes, that's data one also wants to collect if one wants to look for correlations. However, there is information contained in the process before the output that I believe is more relevant than the output itself. And to get that information, the only thing you can do seems to me to ask others that were involved in the output production. Best,


  4. Hi Phillip,

    Well, there are always statistical fluctuations because luck plays a role. If somebody has a mean performance of X and you're impressed by their recent performance peak, chances are their performance will indeed go down rather than up because it was a statistical fluke. That's another reason to collect data. Best,


  5. This comment has been removed by the author.

  6. Nothing can statistically infer, model, extrapolate, datamine, spreadsheet, parameterize, PERT chart... discovery. We carefully discard the worst and the best people. A microwave horn fouled with pigeon poop became a Nobel Prize. So did driving up a mountain road while stoned. The latter is worth $billions/year (both PCR and marijuana).

    People rejected for being space alien weird are nuggets amidst dross. "Autoritätsdusel ist der größte Feind der Wahrheit," then he denied quantum mechanics. Human Resources enforces mediocrity, a vice of the doomed. Always retain a few Profoundly Gifted. Discard them after emptying, or become Google. Your choice.

  7. I'm also reading Kahneman's book and recently described something from it in a blog post...although different from your discussion. It was his point about moving your attention away from a distraction by focusing intently on something else (involuntary vs. voluntary modes of thinking). I used that strategy during a presentation recently when my attention was distracted by a bizarre interruption.

  8. This comment has been removed by the author.

  9. So you want people to fit in a box?

    I have been thinking lately about our mathematical universe.:) Maybe there is some correlation here as to expectations?

    You want to provide as free a space as possible for new data to enter the world of your information. If you constraint all data then what can really be original?



Comment moderation on this blog is turned on.
Submitted comments will only appear after manual approval, which can take up to 24 hours.