Sabine Hossenfelder: Backreaction: Self-driving car rewarded for speed learns to spin in circles. Or, how science works like a neural net.

Saturday, November 10, 2018

Self-driving car rewarded for speed learns to spin in circles. Or, how science works like a neural net.

When I write about problems with the current organization of scientific research, I like to explain that science is a self-organizing, adaptive system. Unfortunately, that’s when most people stop reading because they have no idea what the heck I am talking about.

I now realized there is a better way to explain it, one which has the added benefit of raising the impression that it’s both a new idea and easy to understand: Science works like a neural network. Or an artificial intelligence, just to make sure we have all the buzzwords in place. Of course that’s because neural networks are really adaptive systems, neither of which is really a new idea, but then even Coca Cola sometimes redesigns their bottles.

In science, we have a system with individual actors that we feed with data. This system tries to optimize a certain reward-function and gets feedback about how well it’s doing. Iterate, and the system will learn ways to achieve its goals by extrapolating patterns in the data.

Neural nets can be a powerful method to arrive at new solutions for data-intensive problems. However, whether the feedback loop gives the desired result strongly depends on how carefully you configure the reward function. To translate this back to my going on about the malaises of scientific research, if you give researchers the wrong incentives, they will learn unintended lessons.

Just the other day I came across a list of such unintended lessons learned by neural nets. Example: Reward a simulated car for continuously going at high speed, and it will learn to rapidly spin in a circle:

limited state info from sim though... was wondering if maintaining high speed was a reasonable reward for RL? nope... #kenBlock pic.twitter.com/rr3H1jL3gQ
— mat kelcey (@mat_kelcey) July 15, 2017

Likewise, researchers rewarded to produce papers at a high frequency will learn to rapidly spin around their own axis by inventing and debating problems that don’t lead anywhere. Some recent examples from my own field are the black hole firewall, the non-naturalness of the Higgs-mass, or the string theory swampland.

Here is another gem: “Agent pauses the game indefinitely to avoid losing.” I see close parallels to the current proliferation of theories that are impossible to rule out, such as supersymmetries and multiverses.

But it could be worse, at least we are not moving backwards:

I hooked a neural network up to my Roomba. I wanted it to learn to navigate without bumping into things, so I set up a reward scheme to encourage speed and discourage hitting the bumper sensors.

It learnt to drive backwards, because there are no bumpers on the back. https://t.co/8PdR3p6ePZ
— Custard Smingleigh (@Smingleigh) November 8, 2018

At least we are not moving backward yet. Because now that I think about it, rediscovering long-known explanations would also be a good way to feign productivity.

Of course I know of the persistent myth that scientific research is evaluated by its ability to describe observations, so I must add some words on this: I know that’s what you were told, but it’s not how it works in practice. In practice, scientists and funding agencies likewise must evaluate hypotheses prior to test to decide what is worth the time and money of testing to begin with. And the only ones able to evaluate the promise of research directions are researchers themselves.

It follows that there is no external reward function which you can impose on scientists that will optimize the return on investment. The best – indeed the only – method at your disposal is to let scientists make the evaluation internally, and then use their evaluation to distribute funding. In doing this, you may want to impose constraints on how the funding is used, eg by encouraging researchers to study specific topics. Such external constraints will reduce the overall efficiency, but this may be justifiable for societal reasons.

In case you missed it, this solution – which I have written and spoken about for more than a decade now – could come right out of the neo-libertarian’s handbook. The current system is over-regulated and therefore highly inefficient. More regulations will not fix it. This is why I am personally opposed to top-down solutions, like requirements coming from funding agencies.

However, the longer the current situation goes on, the more people we will have in the system who are convinced that what they are doing is the right thing, and the longer it will take for the problem to resolve even if you remove the flawed incentives. Indeed, in my impression the vast majority of scientists today already falls into this category: They sincerely believe that publications and citations are reliable indicators for good research.

Why do these problems persist even though they have been known for decades? I think the major reason is that most people – and that includes scientists themselves – do not understand the operation of the systems that they themselves are part of. It is not something that evolution allowed us to develop any intuitive grasp for.

Scientists in particular by and large think of themselves as islands. They do not take into account the manifold ways in which the information they obtain is affected by the networks they are part of, and neither do they consider that their assessment of this information is influenced by the opinions of others. This is a serious shortcoming in the present education of scientists.

Will drawing an analogy between scientific research and neural nets help them see the light? I don’t know. But maybe then in the not-so-far future we will all be replaced by AIs anyway. At least those sometimes get debugged.

15 comments:

sean s.9:20 AM, November 10, 2018
Law of Unintended Consequences. Trying to make science efficient requires figuring out what "efficient science" would be.

sean s.
ReplyDelete
Replies
JimV10:07 AM, November 10, 2018
I saw that item in this site's excellent Twitter log, and also enjoyed it. It also demonstrates once again, that the universe is smarter than we are. That is, there are often solutions that we didn't think of, which a random search could find. A random search can't fool itself into thinking it has all the bases covered (baseball idiom); we can.

Also it demonstrates that if a system can be gamed, someone or something will game it. ("If you ain't cheating, you ain't trying hard enough," some football coaches say.) Fouls in basketball were put into the rules to prevent certain things; now teams foul deliberately in every game as part of their strategy.

The only solution I can think of (the universe probably knows more) is to increase the penalties. E.g., after the first, say five, team fouls in a half, one more foul shot is added to each foul. A one-shot foul becomes two, a two-shot foul becomes three, and so on. I think this would improve basketball, for me anyway. (Ten seconds on the game clock at the ends of games would no longer take 15 or 20 minutes to play.) Penalize the roomba (in software) for going backwards. (Or reward it for going forward--same thing.)

That how biological evolution works. You fall behind in adaptation, you die or don't reproduce. What the new penalties/incentives should be in the case of science I don't know, though. Granted, ineffective penalties/incentives should be eliminated, at the same time.

The Nobel Committee has done fairly well at identifying real accomplishments, such as LIGO. (I say that assuming LIGO results are real, which I do assume.) Maybe there should be more international prizes for results which meet objective standards, evaluated by impartial referees.
ReplyDelete
Replies
Uncle Al10:13 AM, November 10, 2018
Research is bureaucracy's fungible process fluid. Fundamental research is not a production environment. "The current system is over-regulated and therefore highly inefficient." "The R&D Function" Harvard Business Review 61(6) 195 (1983) Nothing changes.

Husband, "Honey, why do you cut off the roast’s end?”
Wife, "My mother does it."
Mother-in-law, "my mother does it."
Grandma, "The pan was too small."

Embrace the awesome if fickle powers of mistaken assumptions, luck, fetish, autism, and transient frank stupidity. Management to Human Resources, there is nothing "transient" about it.

One microwave rotational spectrum sources baryogenesis, Milgrom acceleration, the cosmological constant; and falsifies the Equivalence principle. Day two reduces quantum mechanics to ket-chup. Look.
ReplyDelete
Replies
uair0110:26 AM, November 10, 2018
This is a nice post and I agree with it. Incentives do matter. But once a system has settled into a wrong equilibrium it may be difficult to move it. Every actor acts rationally but together they waste many resources. There's a convincing analysis of that here: https://equilibriabook.com/molochs-toolbox/
I hope you and your supporters can change that!
ReplyDelete
Replies
Michael John Sarnowski1:39 PM, November 10, 2018
Awesome article. We do get unintended consequences by defining a goal, or a better situation. Hopefully over time it is self correcting.
ReplyDelete
Replies
Marnie2:05 PM, November 10, 2018
Neural networks are limited by their training sequence. The cannot handle problems and adapt beyond the training sequence that has been created (by humans).

Regarding the current state in academia, it is my view that there is an optimum point beyond which competition creates less good research, rather than better research. We should stop thinking that increased competition will produce better outcomes in science.

I was reading that Rainer Weiss actually fail out of MIT during his undergraduate years (he fell in love with a musician and became so involved in music that he wasn't paying attention as much as he should have to physics.) He was given a second chance by an open minded mentor at MIT. Weiss also didn't publish very much in the early part of his career. He spent a lot of time tinkering and thinking instead about experimentation instead.

It's hard to think of someone having that kind of career path today.

A need for leadership has been mentioned in a number of studies on what to do about the poor working environment that has taken hold in academia in the last thirty years or so.

I do think that some areas of science and engineering currently lack leadership and are basically on autopilot with the wrong training sequence. There seems to be little acknowledgement of how broken things are and because we've created this huge science machine, which has so much inertia, it almost impossible for anyone to give any productive corrections.
ReplyDelete
Replies
SRP2:40 PM, November 10, 2018
Welcome to the wonderful world of economics and incentive design. Lots of writing there about tradeoffs between what can be observed and what can be distorted, and the problem of simultaneously dealing with shirking, risk aversion, and task distortion.
ReplyDelete
Replies
Steven B Kurtz6:56 PM, November 10, 2018
I love your whole system thinking, Bee. More physicists should learn from you. Nano, micro, engineering, etc are all needed, but they are not fundamental and philosophical.
ReplyDelete
Replies
naivetheorist5:07 AM, November 11, 2018
bee:

i have been a libertarian my entire life and involved with the philosophy and policy of libertarianism for over 50 years, having discussions with its leading exponents such as Rothbard and Block and i have never heard of neo-libertarianism. i've read your blog entries that you reference but it is still unclear to me what neo-libertarianism is and how it differs from individualist libertarianism. can you state succinctly (in just a few sentences as it is possible to do for individual libertarianism, or progressivism for that matter) state the ideas of neo-libertarianism.

btw - your example of the car teaching itself to spin is a common feature of government intervention. it is commonly referred to in the so-called social sciences (i.e. pseudo-sciences as Feynman refers to them) as the law of unintended consequences.

thanks

naive theorist
ReplyDelete
Replies
Sabine Hossenfelder6:12 AM, November 11, 2018
naive theorist,

I actually looked this up on Wikipedia before I wrote this blogpost, just to make sure it means what I think it means, so this is the way I use the word. I guess you could sum it up with libaralism for softies. More seriously, optimization of the primary variable (predictability or profit) may simply not be the only thing that society cares for, and for this reason I think it makes sense to allow some regulations even if those limit efficiency. Now, when it comes to politics that opens a can of worms because now you have to debate which regulations are and aren't good ones and just what you want to optimize instead. When it comes to science, scientists don't get to decide anyway, so it's simply an external constraint.
ReplyDelete
Replies
Unknown8:40 AM, November 11, 2018
I think you hit this topic straight on.
A few years ago, while researching on neural networks, I came across a the EPLEX group at university of central florida. They experimented with neural networks and evolutionary algorithms. Specifically, at one time, they experimented with neural networks without a classical reward function, i.e. without a measure which has to be maximized.

An classical AI example, they gave, is to train a neural network to navigate a maze. They found that the measure [minimum distance to goal after a certain number of steps] was beaten by an algorithm without an explicit goal. They called their new algorithm "novelty search", which only allows for each new generation of neural networks to do something different than the ones before.

With this Ansatz, they not only beat the maze, but they also trained a (virtual) robot how to walk (i.e., without having walking as a goal and without any measure as to "how good" it could walk)

Then, they thought, that they really stumbled across something deep about creativity and innovation and they went on to publish a book about this exact topic ("Why Greatness cannot be planned. The myth of the objective", a popular science book)

You should really look into this: http://eplex.cs.ucf.edu/noveltysearch/userspage/#motivation

TLDR; a research group in Florida found that explicitly setting a goal by a proxy measure (such as publications) found that setting a goal can be totally inefficient and detrimental to progress, and they have data to back up this view.
ReplyDelete
Replies
Lucas Morton12:32 PM, November 11, 2018
Yes! We need a solution to this problem. Harry Crane at Rutgers has an provocative idea: require scientists who draw probabilistic conclusions (esp. p-values) to place sizeable bets on the replicability of their results. http://harrycrane.com/FPP-final.pdf This idea seems to have been inspired by Nassim Taleb's notion of 'skin in the game.' Crane's solution ultimately flounders on the problem of who gets to decide if a subsequent study actually replicated a finding - this brings us right back to the problem of peer review, when Crane had intended to make the process more empirical and less subjective. But we all agree that the present situation is not satisfactory: there's too much opportunity for groupthink in science as it is being practiced.
ReplyDelete
Replies
David Bailey1:35 PM, November 11, 2018
In some situations - particularly research into potential hazards - such as global warming - I think the best solution might be to organise two equally well funded teams, each trying to demonstrate opposite results. Anything that involved heroically resolving signal from noise would also be suitable for this approach.

Thus if this were applied to LIGO, there would be a team of people trying to prove that the signal was of conventional origin - i.e. a glitch, with the same resources and level of expertise as the LIGO team trying to prove that gravitational waves had been detected. Neither side would be permitted to keep something secret - they would have to share everything, even though they were pushing in opposite directions.

ReplyDelete
Replies
Steven B Kurtz2:02 PM, November 11, 2018
Thanks, L.M. Crane has proposed a welcome process. I wonder what the reception has been by the scientific community. There is an existing charitable wager org. which is not limited to hard science. It is an attempt to focus thinking about future outcomes. See:

http://longbets.org/about/

Note that Jeff Bezos is now backing The Long Now Foundation, and that Kevin Kelly (Wired) and Stewart Brand (Whole Earth Catalog) were the founders.
ReplyDelete
Replies
sean s.4:08 PM, November 14, 2018
Adversarial Collaborations:

http://slatestarcodex.com/2018/04/26/call-for-adversarial-collaborations/

sean s.
ReplyDelete
Replies