Sunday, August 18, 2013

Researchers and coffee consumption

You might have seen this collection of 40 world maps in your news feed recently. It's interesting and worth a look. When I scrolled down the list I thought it looks like the number of researchers (per million inhabitants) is correlated with the coffee consumption (in kg per capita). So I pulled down the data and plotted it in excel and here we go:

Coffee consumption vs number of researchers. The red dot is Germany.

I passionately hate excel and I have no idea how to convince it to give me a p-value, but I've seen worse correlations being published. More coffee consumption linked to more research!

If you want to play with the data, you can download the excel sheet here. I've left out Singapore from the table because I wasn't sure whether the entry "0" meant there's no data, or nobody in Singapore drinks coffee. I've made a second plot where I left out the 15 main coffee export countries (according to Wikipedia), but visually it doesn't make much of a difference so I'm not showing you the graph. (It's in the excel sheet.) According to the data on researchers per million inhabitants is from the UNESCO Institute for Statistics, and the data on coffee consumption is from the World Resources Institute.

Don't take this too seriously. I'd guess that you'd find a similar correlation for many consume goods. It has some amusement value though :o)


  1. And what would you say of this one. It even sounds plausible:

    The Effect of Sexual Activity on Wages

    The purpose of this study is to estimate whether sexual activity is associated with wages, and also to estimate potential interactions between individuals’ characteristics, wages and sexual activity. The central hypothesis behind this research is that sexual activity, like health indicators and mental well-being, may be thought of as part of an individual’s set of productive traits that affect wages.

  2. This comment has been removed by the author.

  3. Another similar and amusing food-related correlation can apparently be found between chocolate consumption and Nobel prizes.

  4. Hi Bee,

    What this has me now wondering is which of the two elements one should consider as causal and which resultant; or which fundamental as to having the other emergent :-)

    “I have measured out my life with coffee spoons.”

    -T.S. Eliot, “The Love Song of J. Alfred Prufrock”



  5. There should be a "Format Trendline" and the options in that allow you to display the equation. From your spreadsheet I get

    y = 0.0011x + 0.7341

  6. Paul Erdős' colleague Alfréd Rényi said, "a mathematician is a machine for turning coffee into theorems" (Erdos also did speed). High autists/Aspergers are focussed by stimulants. Some side effects may obtain.

    The US Department of Education identifies the Severely and Profoundly Gifted, to destroy them as social justice for privleged minorities. Thus Warner Bros. Los Angeles County Honorship recipient Kashawn Campbell. Honorship? "He would like to major in communications and pursue a career in broadcasting." Perfect.

  7. The product moment correlation coefficient for this data is 0.608; For 116 data points, the critical value for five sigma is 0.445.

    You have about 8 sigma here: that's p~10^-15

  8. Bob: I have a hard time believing your numbers.

  9. Though, if I think about it, the figure is visually very misleading because one doesn't see that there's like 100 points or so crammed into the lower left corner, basically on top of each other.

  10. Yes, the numbers are quite striking. I've tried to do it a little more reliably, but it isn't very easy. (Apparently, if you have Mathematica 9, you can use CorrelationTest to get a p-value. it doesn't appear to be in 8, which I have.)

    The PMCC is generated by Excel, as Arun said. Actually, r=0.780, which is even higher. It's the squared value that is 0.608.

    If you put 0.445 and 116 in this calculator, you'll see the five sigma (compare with five sigma here).

    r=0.365 gives 4 sigma,
    r=0.404 gives 4.5 sigma,
    r=0.445 gives 5 sigma,
    r=0.482 gives 5.5 sigma.

    The calculator doesn't go much further than that, but the asymptotic behaviour is pretty clear. If you want r=0.780, you'll have about 9.25 sigma. Which is p=2 x 10^-20.

  11. Alternatively, there's a formula here, which integrates to give p=3 x 10^-25.

    It's getting smaller every time I look.

    I think they're correlated.

  12. Well, I guess if the null hypothesis is that each country irrespective of their researchers fraction consumes coffee somewhere in the observed range, then the probability that just by chance the 80 countries with a small fraction of researchers consume basically no coffee is indeed tiny. Of course, as I said in my post, that doesn't make much sense as a null hypothesis. It would make more sense to use correlation with GDP or maybe average income/average cost of living or some other measure of household liquidity or wealth.

  13. Yes, if what you want is a causal hypothesis, it gets a lot trickier.

    You could test it to some extent by claiming that the GDP-coffee correlation and the GDP-researcher correlation are both stronger than the coffee-researcher correlation. If that claim held up, it would pretty much kill any suggestion that there's a direct causal coffee-researcher link, if anyone was silly enough to make one.

    Confirming a causal link - unless you have plenty of evidence of what happens when coffee supplies dramatically change while the background of other possible factors is very stable - is virtually impossible.

    This frees up the communities of political, social and economic scientists to harbour lots of strongly-held and mutually-contradictory beliefs about the most fundamental causal relations, such as whether tightening all government spending tends to increase or decrease a deficit. They're encouraged to appear very sure of themselves if they want to be taken seriously, which amounts to a very effective way of selecting for the deluded.

    I'm sure some of that happens in physics too, but I'd expect it to be a lot less. That's my causal hypothesis, anyway! If someone gave me some coffee, I'd follow it up.

  14. Oh for God's sake, this is really silly.

    Coffee has got nothing to do with productive scientific research. Zip, zilch, zero.

    It's nicotine that does it. We lived in rags and huts until we discovered tobacco. Shortly thereafter we had cars and planes and skyscrapers, and e-lec-tri-city.


  15. Bee - I'm disappointed.. why no log/log plot?

  16. Because I hate excel! Just typing the word causes me mental pain. I think it's because I associate it with university administration, something I normally try to avoid contact with at all means.


COMMENTS ON THIS BLOG ARE PERMANENTLY CLOSED. You can join the discussion on Patreon.

Note: Only a member of this blog may post a comment.