Monday, February 15, 2016

What makes an idea worthy? An interview with Anthony Aguirre

That science works merely by testing hypotheses has never been less true than today. As data have become more precise and theories have become more successful, scientists have become increasingly careful in selecting hypotheses before even putting them to test. Commissioning an experiment for every odd idea would be an utter waste of time, not to mention money. But what makes an idea worthy?

Pre-selection of hypotheses is especially important in fields where internal consistency and agreement with existing data are very strong constraints already, and it therefore plays an essential role in the foundation of physics. In this area, most new hypotheses are born dead or die very quickly, and researchers would rather not waste time devising experimental tests for ill-fated non-starters. During their career, physicists must thus constantly decide whether a new ideas justifies spending years of research on it. Next to personal interest, their decision criteria are often based on experience and community norms – past-oriented guidelines that reinforce academic inertia.

Philosopher Richard Dawid coined the word “post-empirical assessment” for the practice of hypotheses pre-selection, and described it as a non-disclosed Bayesian probability estimate. But philosophy is one thing, doing research another thing. For the practicing scientist, the relevant question is whether a disclosed and organized pre-selection could help advance research. This would require the assessment to be performed in a cleaner way than is presently the case, a way that is less prone to error induced by social and cognitive biases.

One way to achieve this could be to give researchers incentives for avoiding such biases. Monetary incentives are a possibility, but to convince a scientist that their best path of action is putting aside the need to promote their own research would mean incentives totaling research grants for several years – an amount that adverts on nerd pages won’t raise, and thus an idea that seems one of these ill-fated non-starters. But then for most scientists their reputation is more important than money.

Anthony Aquirre.
Image Credits: Kelly Castro.
And so Anthony Aquirre, Professor of Physics at UC Santa Cruz, devised an algorithm by which scientists can estimate the chances that an idea succeeds, and gain reputation by making accurate predictions. On his website Metaculus, users are asked to evaluate the likelihood of success for various scientific and technological developments. In the below email exchange, Antony explains his idea.

Bee: Last time I heard from you, you were looking for bubble collisions as evidence of the multiverse. Now you want physicists to help you evaluate the expected impact of high-risk/high-reward research. What happened?

Anthony: Actually, I’ve been thinking about high-risk/high-reward research for longer than bubble collisions! The Foundational Questions Institute (FQXi) is now in its tenth year, and from the beginning we’ve seen part of FQXi’s mission as helping to support the high-risk/high-reward part of the research funding spectrum, which is not that well-served by the national funding agencies. So it’s a long-standing question how to best evaluate exactly how high-risk and high-reward a given proposal is.

Bubble collisions are actually a useful example of this. It’s clear that seeing evidence of an eternal-inflation multiverse would be pretty huge news, and of deep scientific interest. But even if eternal inflation is right, there are different versions of it, some of which have bubble and some of which don’t; and even of those that do, only some subset will yield observable bubble collisions. So: how much effort should be put into looking for them? A few years of grad student or postdoc time? In my opinion, yes. A dedicated satellite mission? No way, unless there were some other evidence to go on.

(Another lesson, here, in my opinion, is that if one were to simply accept the dismissive “the multiverse is inherently unobservable” critique, one would never work out that bubble collisions might be observable in the first place.)

B: What is your relation to FQXi?

A: Max Tegmark and I started FQXi in 2006, and have had a lot of fun (and only a bit of suffering!) trying to build something maximally useful to community of people thinking about the type of foundational, big-picture questions we like to think about.

B: What problem do you want to address with Metaculus?

Predicting and evaluating (should “prevaluating” be a word?) science research impact was actually — for me — the second motivation for Metaculus. The first grew out of another nonprofit I helped found, the Future of Life Institute (FLI). A core question there is how major new technologies like AI, genetic engineering, nanotech, etc., are likely to unfold. That’s a hard thing to know, but not impossible to make interesting and useful forecasts for.

FLI and organizations like it could try to build up a forecasting capability by hiring a bunch of researchers to do that. But I wanted to try something different: to generate a platform for soliciting and aggregating predictions that — with enough participation and data generation — could make accurate and well-calibrated predictions about future technology emergence as well as a whole bunch of other things.

As this idea developed, my collaborators (including Greg Laughlin at UCSC) and I realized that it might also be useful in filling a hole in our community’s ability to predict the impact of research. This could in principle help make better decisions about questions ranging from the daily (“Which of these 40 papers in my “to read” folder should I actually carefully read”) to the large-scale (“Should we fund this $2M experiment on quantum cognition?”).

B: How does Metaculus work?

The basic structure is of a set of (currently) binary questions about the occurrence of future events, ranging from predictions about technologies like self-driving cars, Go-playing AIs and nuclear fusion, to pure science questions such as the detection of Planet 9, publication of experiments in quantum cognition or tabletop quantum gravity, or announcement of the detection of gravitational waves.

Participants are invited assess the likelihood (1%-99%) of those events occurring. When a given question ‘resolves’ as either true or false, points are award depending upon a user's prediction, the community’s predictions, and what actually happened. These points add a competitive game aspect, but serve a more important purpose of providing steady feedback so that predictors can learn how to predict more accurately, and with better calibration. As data accumulations, predictors will also amass a track record, both overall and in particular subjects. This can be used to aggregate predictions into a single, more accurate, one (at the moment, the ‘community’ predictions is just a straight median).

An important aspect of this, I think is not ‘just’ to make better predictions about well-known questions, but to create lots and lots of well-posed questions. It really does make you think about things differently when you have to come up with a well-posed question that has a clear criterion for resolution. And there are lots of questions where even a few predictions (even one!) by the right people can be a very useful resource. So a real utility is for this to be a sort of central clearing-house for predictions.

B: What is the best possible outcome that you can imagine from this website and what does it take to get there?

A: The best outcome I could imagine would be this becoming really large-scale and useful, like a Wikipedia or Quora for predictions. It would also be a venue in which the credibility to make pronouncements about the future would actually be based on one’s actual demonstrated ability to make good predictions. There is, sadly, nothing like that in our current public discourse, and we could really use it.

I’d also be happy (if not as happy) to see Metaculus find a more narrow but deep niche, for example in predicting just scientific research/experiment success, or just high-impact technological rollouts (such as AI or Biotech).

In either case, it will take continued steady growth of both the community of users and the website’s capabilities. We already have all sorts of plans for multi-outcome questions, contingent questions, Bayes nets, algorithms for matching questions to predictors, etc. — but that will take time. We also need feedback about what users like, and what they would like the system to be able to do. So please try it out, spread the word, and let us know what you think!


  1. This should be available in all fields in which empirical testing of clear hypotheses is (or ought to be) possible.

  2. "what makes an idea worthy?" Sourcing not curve fitting; predict a wowser. Respect prior observation. Cheap, fast, has literature citations. Microwave generators started huge. Cavity magnetrons are softballs. A Gunn diode is a grain of salt. Sub-wavelength optics? Look.

    Axiomatic systems cannot internally correct empirically defective postulates. Newton excludes GR, QM, and stat mech by blowing c, h, and k_B. Physics arises from Equivalence Principle symmetries. Chemistry external to physics says "bench top EP violation at will, six different ways." Look. Delphi polls evolve, individuals create.
    "Exceptional employees don’t possess God-given personality traits; they rely on simple, everyday EQ [emotional intelligence] skills that anyone can incorporate into their repertoire."

    1632, slide rule; 1972, HP-35; 1973, no slide rules.

  3. Seems like you could use a few philosophers in the mix in any case.

  4. It's a misleading use of the word 'prediction'....which adds to an existing damaging effect in which scientists are rapidly losing touch with what words are supposed to mean.
    He's talking about judgement. There's nothing wrong with that, but it's judgement in the same sense that an executive or a gambler can get a reputation for being on the money.
    A prediction is not a judgement, it's a direct consequence of something of a special category defined by the presence also, of such heavy constraints such that the consequence must be true if the theory is true.
    Not a judgement.

  5. "publication of experiments in quantum cognition"

    Hmmm. I see the Metaculus question is actually about experiments in quantum mechanics in cognition rather than about experiments in quantum cognition:

    "Indeed, the mathematical structure of quantum theory, with its non-classical (non-Kolmogorovian) probability calculus, has been used with considerable success in the past decade to model aspects of human cognition, such that a new field of research within cognitive science, referred to as ‘quantum cognition’, emerged"

  6. sabine says "(Another lesson, here, in my opinion, is that if one were to simply accept the dismissive “the multiverse is inherently unobservable” critique, one would never work out that bubble collisions might be observable in the first place.)"

    The thing is, you can opt into multiverse envisioning, but in doing so you are opting out of the traditional scientific instinct. You cannot sustain both. This is because they are mirror opposites down to actual attributes situated around the same thing - which is abstract discovery.

    The scientific instinct is very fragile and mysterious. It's a hard path and immensely unproductive at the level of a personal life. But over historic time because it compounds, it has shaped our reality and world and enlightened us.

    It can't compete with the multiverse in the short term, because the multiverse is the most productive force in the human universe. The breakthroughs, and amazing insights, and euphoria, and public fascination,'s knock down superior. When you go in, you get a lot of good vibes and reasons to stay in.

    You also lose the sense in which science sees the world. The scientific instinct makes no sense - literally - from the multiverse instinct. It's not even possible to keep hold of it conceptually. You lose the concepts.

    The reason the colliding bubbles don't matter, is because they arise by intuitive guessing from a vague basis. If they're not there, the multiverse with infinite resources, just morphs.

    And guess what. Eventually something like colliding bubbles will get observed, either as the outcome of a numbers game, or because people eventually wise up to the fact they can actually influence the multiverse in the direction of a prediction that they happen to know to be there.

    Save yourself Sabine, while you still can!

  7. Lucy,

    You misread that, it wasn't me who said that, but Anthony.

  8. Said simpler: People don't make predictions. Theories make predictions. In Science.

  9. Lucy,

    You misunderstand the intention. You can develop a theory, make a prediction and test it, or you can develop a technology and observe what it does. But what do you do if you need to evaluate the chances of success *before* you have been able to test the theory or to create the technology? For this, expert's opinions is the only thing you can rely on. It's a PRE-selection of hypotheses.

  10. Hi Sabine - prediction means something specific in Science. Pre-selection of a theory does not involve prediction in that sense, currently. Using the same word in the same domain for different meanings entirely is damaging at a time when there is already a serious problem of, basically, concept desertification, at the methodological level.

  11. I remember Anthony Aguirre from the very first times of FQXi. He was worried with the question "How do we fund Einstein without funding Crackpots?" ( Years have passed since then and I hope now he has finally devised a rigorous mean to distinguish ideas.

  12. Hello again. I'm back to blogging. Nice article Bee.


COMMENTS ON THIS BLOG ARE PERMANENTLY CLOSED. You can join the discussion on Patreon.

Note: Only a member of this blog may post a comment.