Thursday, December 03, 2015

Peer Review and its Discontents [slide show]

I have made a slide-show of my Monday talk a the Munin conference and managed to squeeze a one-hour lecture into 23 minutes. Don't expect too much, nothing happens in this video, it's just me mumbling over the slides (no singing either ;)). I was also recorded on Monday, but if you prefer the version with me walking around and talking for 60 minutes you'll have to wait a few days until the recording goes online.

I am very much interested in finding a practical solution to these problems. If you have proposals to make, please get in touch with me or leave a comment.


Arun said...

To identify transformative research - maybe the following might be useful?

Every contributor to arxiv, as a condition to be able to contribute, is assigned at random some small number of arxiv papers, and must rate them (anonymously) 0-5 on a "might be transformative" scale. Hopefully this is not an onerous task.

Once these ratings are in, arxiv takes the papers that have an average score above some threshold, and randomly assigns them to a larger group of contributors, who are required to provide an anonymous second 0-5 rating on the "might be transformative" scale. It may be best that the people don't know that this is the next round of rating, to preclude any bandwagon effects. Because people might deduce this by the interval between when the preprint was published versus when they were asked to rate it, the rating system will have to reach back many months into the past.

I don't know how many times this needs to be repeated, what the threshold should be to get to the next round of rating, and at what point the score that a paper earned should be made visible to all. But the basic idea is that involuntary, random, rating with successive selection might help surface the promising ideas.

akidbelle said...

Hi, your slides are of interest to me but you can only observe from your own perspective. I think you miss two or three point, or maybe I misunderstood.

1) One cannot publish anything that is not "already" sold to an existing group. That is only a paper or project that goes to an existing niche in the system ecology will be publicized or funded (except for some minor papers which nobody reads). As a result, fundamental physics has been spinning in the same concepts since many decades.

2) Major progress did not happened as you state. There was always a real leap coming from one or two geniuses. It did not come from new (recent) experimental data. Right now high energy physicists expect the new physics to be a cut and paste of the old one (the same field concept, the same methods of understanding). All that is tested (searched) is already in theories built on conformity.

3) Any non-conforming researcher is a crackpot; hence career first, then no revolution, then no real progress. Only tiny bit by bit evolution. The process can only reach a local maximum and maximal entropy. This is where we are.

The problem in my views is not in peer-review. It is first in the understanding of the cost of originality - or novelty - as opposed to conformance.

The real systemic goal is to grow groups for funding. That's all. Eventually, the pack of sheep falls down the cliff... so there is still hope, this is natural selection.

I hope this helps

Noa Drake said...

Maybe a little far fetched today, but as computing capacity increases exponentially, we could have computer algorythms do a lot of work for us. They could select, filter, pre- review papers along all sorts of criteria, nine of which will be plagued by any bias of any kind. Is this a stupid idea ? Not if you have read anything about genetic algorythms that already are implemented in designing inventions. It's about combining previous generations of solutions into mutations until a superior hybrid is created, and it works. So these algorythms actually filter good ideas out of many ideas etc. Google on Icosystem , Eric Bonabeau. Source : New Scientist, Dutch edition nov 2015.
You could have such an algorythm serve the peer reviewers the most promising selection for them to look into more extensively for instance.

Uncle Al said...

Management is rewarded for enforcing rules, counting things, avoiding risk (overpaying for opportunity). A strongly invested axiomatic system failing in application (e.g., religion) is not falsified, it is process. Experiment to confirm not falsify. Violating rules incurs risk.

Best investments allocate resources to known "truth," whether it is or not (more studies are needed). Discovery was bootlegged off-budget - weekends, below notice embezzlement of funding. Explosive managerial growth in size, authority, and social equity (plus computer monitoring) render science Medieval static, actively excluding insubordination.

"It needs objective expert judgment." It also needs some slop in the gears and a little insanity. Progress traditionally occurred funeral to funeral. 21st century progress is managerial process. Progress is criminal.

Sabine Hossenfelder said...


You are underestimating the difficulty. Nobody would want to rate papers they hadn't read, and they wouldn't have time to read the papers, so they wouldn't rate them. End of story. It wouldn't work.

Sabine Hossenfelder said...


I don't know what you think I missed.

1,3) I said explicitly that the problem is in the way peers rate each other. I said that in the current situation there is no reason to expect that judgement is objective. It is a mistake to try to conclude from this exactly in which way judgement might be non-objective, which is what you are doing. It's not helpful.

2) I didn't say anything about how progress happens.

Sabine Hossenfelder said...

Noa Drake,

Yes, exactly. This could be done, but there isn't any such tool available. The only filters and measures that exist are those that are made for administration or funding purposes. And on top of this many of them are commercial, and one doesn't even know what they measure. (Take this thing called the 'altmetric'. It's a big joke if you ask me.) It's a matter of funding initiative. Best,


akidbelle said...

Dear Sabine,

On points 1,3) I would assume judgement is objective. It means that peers are conscious of their criteria (in my experience, they are) and apply those with minimal bias. The first question is "what is the conceptual context in which the judgement is made?" as it drives some unavoidable bias. The second one is "what is the goal of the editor?". So I guess it may be productive to ask them those questions. (Maybe it would even help).

On point 2) I (mis?)understand that the loop in the slide (and in a recent post) is the meta-mechanism in which scientific progress can happen and can be recognized as such. If that is correct (is it?), you state indirectly in what framework progress happens because this is how it is filtered in or out (since an unpublished paper is worth zero). So I understand that if a difference exists between science and sorcery it comes "within" that loop - an idea with which I agree.

In one of the last slide you ask for data, data, data. Now all TOEs "only miss data", same for strings, SUSY and so on. So we are in an epoch where a very large part of physics paper lives without data - think about it, this is already a huge joke. If you look at an epoch of real progress (say from the beginning of the 20th century up to Feynman), you will see that it systematically comes:
- from existing data,
- with new explanation,
- falsifiable almost immediately,
- with no sorcery (with the exception of GRT where Einstein states at the very beginning that gravitation is interpreted as a curvature of space-time - where at this epoch, space-time is already the accepted interpretation of special relativity equations).

So I'll bet papers can be filtered accordingly. Interestingly one can already find the probable productive questions: what are the existing unexplained data (not the existing problems as they already embed (pre-)conceptions in their wordings, but really the data).


Sabine Hossenfelder said...


You totally misunderstand what I explained. Of course you can make progress on the theoretical side and have that confirmed by data later. I never implied anywhere that it has to be otherwise. The 'real leap' from some genius is a modification (first step) which is then assessed by peers. If it's a theoretical modification, this is also where data comes in.

The 'data, data, data' on one of the last slides refers to data about the working of the scientific system. Scientists need data to know how well connected their communities are, how much their colleagues are affected by financial pressure, how good they believe the system works, and so on and so forth. There isn't any useful data on this (other than my survey), thus scientists have no clue what is going on.

The problem with your suggestions is that this is your opinion. I too have my opinions. Everyone has their opinions. That isn't going to help because who says that your opinion is any better than mine.

You write "It means that peers are conscious of their criteria (in my experience, they are) and apply those with minimal bias."

In my experience they are not. Scientist have all kinds of cognitive and social biases that they are not consciously addressing. Most of them indeed deny they even have any biases. If you tell them that, as a matter of fact, people are influenced by the opinions of others around them and that spending much time in a group of people with similar believes supports these believes, they'll just be insulted. But look, in-group behavior is an extremely well documented phenomenon. The same goes for biases like loss-aversion or motivated cognition. Do scientists make any conscious effort to avoid these biases: No. Do we, therefore, have reason to believe that science is not affected by these biases. No.

Please note that this wasn't a talk addressed to physicists - the vast majority of the audience were not scientists - and it isn't specifically about physics. I just added that I am a physicist to make clear what the background is of my experience.



Arun said...

I'll take your word for it; but I think it would depend on how many additional papers one is asking the arxiv user to read; and this is not a peer-review type of read, just a rating that this is likely to be important, which is kind of the filtering you do anyway on the daily crop of pre-prints.

akidbelle said...

Hi Sabine,

thanks, I had not understood the data, data, data in this way.

I said peers are conscious of their criteria and I first though of referees and editors. I did not say the criteria are always honest but they usually express those in questions. Still, I think the main obstacle to a good idea is to pass the filter of publication.

I know about in-group behavior (those I named a pack of sheep) and I like Dawkin's idea of "meme" in "the selfish gene" chapter 7 or 9. I think it would be quite interesting to understand science(s) in this way, to begin with particles physics and cosmology. I even think this is a great subject for a thesis in sociology of science.

About opinion: unexplained data is not an opinion, it is just facts and there are plenty. The opinion is about the acceptable framework or concept or logic to explain it. This is part of education (yours, mine, anyone else), and in publication acceptable = dominant meme. I do not mean to reject papers because of my criteria, I just think it is possible to classify papers accordingly - and for what I have read... (I think you can guess my opinion).

What I mean (in physics) is that the acceptable paths to fundamental progress are extremely reduced; on the experimental side it amounts to new particles and almost nothing else even in cosmology, and to different mathematical ideas on the theoretical side. You wrote: "Of course you can make progress on the theoretical side and have that confirmed by data later". What I mean is that even this is not authorized unless you predict a new particle (usually a large bunch) or a tiny deviation from existing theory (e.g. MOND). What is relevant in my views is firstly to work on existing unexplained data - with or without confirmation by later data.
Then if one gets a single coherent explanation for several unexplained physical data what would you think it means?

I'll stop here because I could write 20 pages about that. I am trying to be clear, but it's not obvious.


Sabine Hossenfelder said...


I've thought about this again and maybe I am being too cynical. Randomly assigning papers and requesting a judgement in return for using some service is a good idea. I will keep this in mind.

Sabine Hossenfelder said...


I think I agree with you more than I disagree. Yes, in physics the 'acceptable' path has become very limited. I think that's a combination of cognitive and social biases. I disagree though on the issue of 'unexplained data'. Scientists make a judgement for what data is in need of explanation to begin with, and that's big business. Take the issue of naturalness. Why does some constant need an explanation. What's wrong with it just being a constant? I think there is a huge amount of effort wasted on explaining things that don't need any explanation, like certain numerical coincidences or parameters that are just parameters. There is definitely a subjective judgement going into the assessment what is currently in need of explanation. Best,


John Loop said...

I am flumoxed [wonderful word]... I really enjoy reading [some, many] of your posts -you have occasionlly a wonderful gift with words - I am an engineer [failed Astronomer/cosmologist/physicist wannabe at 71 yrs :-( ] But I also enjoy reading Lubos blog. He is often outrageous, but often wonderfully entertaining, illuminating, and often VERY educational [like you, well maybe you are not often outrageous...]. He has apparently gone "crazy" responding to your talk on his blog. Stuck in the middle again.. Life is not simple. How to deal with all this.... John

akidbelle said...

Dear Sabine,

I also think we agree much more than we disagree.

Concerning naturalness and more generally constants, I do not think we have the end of the picture, far from it. I would feel fine with maybe 2 or 5 constants and parameters but we really have a lot (19 in the SM, plus 6 for 3 neutrinos, plus gravitation and cosmology). But this is subjective for sure; on the other hand, all progress in the history of fundamental physics have been triggered by the analysis of coincidences leading to better and better mathematical treatment. An idiot would say that the 1/n^2 in the hydrogen energy levels is just a coincidence - but our IQ is only human. So we cannot count coincidences out and among the constants and parameters there is no obvious correlation - no trivial structure to see.

For instance, there must be a very fundamental physical reason why we do not have a continuous spectrum of leptons masses instead of the electon, muon, tau and no other up to at least 1 TeV (probably more, I did not follow the 4th lepton exclusion..). The problem today is that field physics can accommodate almost anything; add 35 more charged leptons in the theory, the Higgs field is the same and it takes 2 minutes to update its Lagrangian; QED is basically the same. So one way or another I think we miss something - and it must be something big; much bigger than the Higgs field.

Thanks again,

Don Foster said...


I understand that some years ago they began using neural networks to grade pork carcasses with an acuity that equaled or exceeded that of human inspectors. I am not suggesting a relationship, but if one were to use a neural network to winnow research proposals or submitted papers, what would be the measurable features used to characterize them? Would there be any indicator that would suggest that some research might lead to paradigm shifting results?

And I have also read that adding noise to a weak signal sometimes boosts its meaningful content to the threshold of detection. Perhaps some portion of research money should be awarded at random.

wsilent said...

Dear Sabine,

thank you for the interesting and complete analysis. You identified two points that concern me a lot. "the inertia" and the "live and let live" issues. Your survey on the inertia is striking and amazing!
I would see a huge improvement if the system would introduce more "volatility" adding the possibility of dropping your research because it is not promising anymore and moving towards other research lines (a Dyson's editorial showed how much fruitful it can be!). Science in the end is based on wrong answers. Denying them makes Science simply not Science. Allowing them, allows creativity to work. It would introduce also more randomness, and move the self-organized system far from stagnation (the local minima). Moreover, I can argue that it would reduce the "live and let live" issue:
We cannot forget that behind each paper there are human beings. If the paper contains ideas not suitable to improve our understanding of our world, then it should be said loud and clear. Right, but if people have not possibilities of changing their paths they can only protect their positions or "die" trying. Obviously, who is "dying" is not the professor, but the student or the young postdoc whose only fault was choosing a dead-end research line 3 or 5 years before, at an age when you have no idea what research is, and you have not enough sight to see what research will be in 5 years. Something almost impossible even for an expert researcher. Consequences are just killing careers randomly. Who remains, will protect his/her own garden fighting, not discussing. New PhD students will arrive, new scientists without a global view (yes, phd students are scientists. Who is actually making science in our universities today), new lives spent on dead-end ideas, while experts are looking for grants.
If every scientist would kill all the ideas who he/she feels wrong what you would get at the end is a war, not science. Only professors with important protected positions would keep their ideas protected in one way or another.
Fortunately, sometimes we must face reality. Facts overcome our "being good" and experiments rule out some theory, something it doesn't occur often in our high-energy physics.

Obviously, I have not a solution for the problem of "intertia". You well explained its sources. All of them very hard to overcome. But, we can recognize that theoretical studies are rather cheap. They don't need huge grants. Feeding the small for me would put the seeds for the change. If the funding institutions would force themselves to provide part of their funds for financing many small groups with long-time cheap positions based on e.g. new ideas that are not financed yet.
What's the need of having 1 professor 10 post-doc and 50 phd students? The professor will struggle for grants to maintain the machinery running, the rest will try to survive. If most of my colleague had a permanent position I'd be really happy to point out how bad their ideas are, because they will have the chance of changing idea, the same for me. Nowadays, I have to work on one field of which I am an expert. If the field is not promising, I am a useless expert. Completely equivalent to a younger, cheaper PhD student who will follow blind ideas of old professors more easily. It is not hiring people for free, it's increase the freedom to "fail". In genetics you need mutations to get the best fitness.


Ross Anderson said...

Dear Sabine

This is indeed a problem and one the historians of science have written about at length. Maxwell discovered his equations in 1861, yet despite the fact he was already famous, no-one but a small group of "Maxwellians" paid attention until Hertz's radio experiment in 1885 (by which time Maxwell had dies), and Lord Kelvin still didn't believe them when he in turn passed away in 1907. Ironically, Kelvin got his peerage for the first transatlantic cable, which only worked because Heaviside (a Maxwellian) figured out transmission lines.

I've seen it on a smaller scale in my own career. On several occasions I've started new lines of research, of which I'm best known for applying microeconomics and game theory to information security starting in 2001. In each of these cases it took personal, face-to-face evangelism to get the first few dozen people interested, and then we had to set up our own workshop or conference as otherwise stuff just wouldn't get published. This was extraordinarily frustrating. Most large complex systems actually fail because of misaligned incentives; if Alice guards a system but Bob pays the costs of failure, he's going to end up unhappy. Yet when we started pointing this out, a typical referee comment was "This paper contains no mathematics. Send it to a management conference."

I had exactly the same experience when we started to apply ideas from signal processing to study covert communications and copyright marking in the mid-1990s, and later in the mid-2000s when we started working with behavioural economists and psychologists on security usability and on persistently irrational risk behaviour. In each case, the key step was to realise that it's not just about having an idea; you need the social entrepreneurship to follow it through.

It does help if you're already eminent enough that you get frequent invitations to give keynote talks, which you can use as a bully pulpit to find new converts. But even so, it can be hard, as Maxwell's story illustrates.

Physics is much worse than computer science, by the way. (My work also crosses into physics as we've developed new semiconductor testing techniques to probe the tamper-resistance of smartcards.) While the typical computer scientist lives by conferences, where the refereeing is blind, physics conferences tend to have many more (or indeed exclusively) invited speakers, so the prospect of an outsider with a brilliant idea getting a hearing is nil. (Even a brilliant grad student had better have a heavyweight adviser to act as his sponsor.)

A further problem is that while in CS we're very entrepreneurial and are quite relaxed about people trying zany ideas to see if they work, physics is much more straight-laced and in some subfields carries almost a religious burden of belief. To take an example we both know a bit about, any young physicist who challenges the current orthodoxy around the Bell tests is just dead in the water – a 'crackpot'. That orthodoxy is too important to the grant proposals and impact arguments of so many people, and dissent can't be tolerated. Yet, as many speakers at the Emergent Quantum Mechanics conference are starting to point out, the Bell tests can as easily be taken to argue for emergent order in the quantum vacuum as for the fashionable epicycles such as multiple universes. See this blog post for more. But if you're a young physicist, you'd better not go there, or you'll never get a job. And meanwhile, we're all supposed to pretend that quantum cryptography is the sunlit future, and smile sweetly at the ever more outrageous claims of the quantum computing crowd.

Arun said...

Dear Bee,
Off-topic, only peripherally related to peer review.

The mathematician Leonhard Euler would be on anyone's list of all-time greats. His published works are available on this website:

I was sort of charmed to find that Euler's earliest publication contains a mistake. The accompanying note says: "The (erroneous) construction of tautochronous curves in media with different forms of resistance. This is Euler's first paper E001; the math in the translation has been clarified (Feb. '07) to account for Euler's mistake. "

I think sometimes science and mathematics students are taught to be mortally afraid of errors, so much so that it constrains their thinking. I think it is better to teach how to detect and deal with error, how to become self-correcting. In this regard, the more that students are made aware that even the greatest of great made mistakes, the better it is.