Wednesday, October 13, 2010

test* the hypothes*

I recently came across a study in the sociology of science and have been wondering how to interpret the results:
    Do Pressures to Publish Increase Scientists' Bias? An Empirical Support from US States Data
    By Daniele Fanelli
    PLoS ONE 5(4): e10271. 1
There are many previous studies showing that papers are more likely to get published and cited if they report "positive results." Fanelli now has found a correlation between the likeliness of reporting positive results and the total number of papers published in a sample of papers with a corresponding author in the USA, published in the years 2000 - 2007, across all disciplines. The papers were sampled by searching the Essential Science Indicator's database with the query "test* the hypothes*" and then the sample was separated into positive and negative results by individual examination (both by the author and by an assistant). The result was as follows:
In a random sample of 1316 papers that declared to have “tested a hypothesis” in all disciplines, outcomes could be significantly predicted by knowing the addresses of the corresponding authors: those based in US states where researchers publish more papers per capita were significantly more likely to report positive results, independently of their discipline, methodology and research expenditure... [T]hese results support the hypothesis that competitive academic environments increase not only the productivity of researchers, but also their bias against “negative” results.

When I read that, I was somewhat surprised about the conclusion. Sure, such a result would "support" the named hypothesis in the sense that it didn't contradict it. But it seems to me like jumping to conclusions. How many other hypothesis can you come up with that are also supported by the results? I'll admit that I hadn't even read the whole paper when I made up the following ones:
  • Authors who publish negative results are sad and depressed people and generally less productive.

  • A scientist who finds a negative result wants more evidence to convince himself his original hypothesis was wrong, thus the study takes longer and in toto less papers are published.

  • Stefan suggested that the folks who published more papers are of the sort who hand out a dozen shallow hypothesis to their students to be tested, and are likely to be confirmed. (Stefan used the, unfortunately untranslatable, German expression "Dünnbrettbohrer," which means literally "thin board driller.")

After I had read the paper, it turns out Fanelli had something to say about Stefan's alternative hypothesis. Before I come to that however, I have to say that I have an issue with the word "positive result." Fanelli writes that he uses the term to "indicate all results that support the experimental hypothesis." That doesn't make a lot of sense to me, as one could simply negate the hypothesis and find a positive result. If it was that easy to circumvent a more difficult to publish, less likely to be cited, summary of ones research results, nobody would ever publish a result that's "negative" in that sense. I think that in most cases a positive result should be understood as one that confirms a hypothesis that "finds something" (say, an effect or a correlation) rather than one that "finds nothing" (we've generated/analyzed loads of data and found noise). I would agree that this isn't well-defined but I think in most cases there would be a broad agreement on what "find something" means, and a negation of the hypothesis wouldn't make the reader buy it as a "positive result." (Here is a counter-example). The problem is then of course that studies which "find nothing" are equally important as the ones that "find something," so the question whether there's a bias in which ones are published is important.

Sticking with his own interpretation, Fanelli considers that researchers who come to a positive result, and in that sense show themselves correct, are just the smarter ones, who are also more productive. He further assumes that the more productive ones are more likely to be found at elite institutions. With his own interpretation this alternative hypothesis doesn't make a lot of sense, because when the paper goes out, who knows what the original hypothesis was anyway? You don't need to be particularly smart to just reformulate it. That reformulation however doesn't make a non-effect into an effect, so let's better consider my interpretation of "positive result." Fanelli argues the explanation that people smart enough to do an experiment where something is to be found are also the ones who publish more papers generally doesn't explain the correlation for two reasons: First, since he assumes these people will be at elite institutions, there should be a correlation with R&D expenditure, which he didn't find. Second, because this explanation alone (without any bias) would mean that in states where 95% - 100% of published results were positive, the smart researchers hardly every misjudged in advance the outcome of an experiment and the experiment was always such that the result was statistically significant, even though other studies have shown that this is not generally the case.

To the alternative hypothesis that Stefan suggested, Fanelli writes:
A possibility that needs to be considered in all regression analyses is whether the cause-effect relationship could be reversed: could some states be more productive precisely because their researchers tend to do many cheap and non-explorative studies (i.e. many simple experiments that test relatively trivial hypotheses)? This appears unlikely, because it would contradict the observation that the most productive institutions are also the more prestigious, and therefore the ones where the most important research tends to be done.
Note that he is first speaking about "states" (which was what actually went into his study) and then later about "institutions." Is it the case indeed that the more productive states (that would be DC, AZ, MD, CA, IL) are also the ones where the most important research is done? It's not that I entirely disagree with this argument, but I don't think it's particularly convincing without clarifying what "most important research" means. Is it maybe research that is well cited? And didn't we learn earlier that positive results tend to get better cited? Seems a little circular, doesn't it?

In the end, I wasn't really convinced by Fanelli's argument that the correlation he finds is a result of systematic bias, though it does sound plausible, and he did verify his own hypothesis.

Let me then remark something about the sample he's used. While Fanelli has good arguments the sample is representative for the US states, it is not clear to me that it is in addition also representative for "all disciplines." The term "test the hypothesis" might just be more commonly used in some fields, e.g. medicine, than in others, e.g. physics. The thing is that in physics what is actually a negative result often comes in the form of a bound on some parameter or a higher precision of confirming some theory. Think of experiments that are "testing the hypothesis" that Lorentz-invariance is broken. There's an abundance of papers that do nothing than report negative results and more negative results (no effect, nothing new, Lorentz-invariance still alive). Yet, I doubt these papers would have shown up in the keyword search, simply because the exact phrase is rarely used. More commonly it would be formulated as "constraining parameters for deviations from Lorentz-invariance" or something similar.

That is not to say however I think there's no bias for positive results in physics. There almost certainly is one, though I suspect you find more of it in theoretical than in experimental physics, and the phrase "testing the hypothesis" again would probably not be used. Thing is that I suspect that a great many of attempts to come up with an explanation or a model that, when confronted with the data, fails, do never get published. And if they do, it's highly plausible that these papers don't get cited very much because it's unlikely very many people will invest further time into a model that was already shown not to work. However, I would argue that such papers should have their own place. That's because it presently very likely happens that many people are trying the same ideas and all find them to fail. They could save time and effort if the failure was explained and documented once and for always. So, I'd be all in favor of a journal for "models that didn't work."

28 comments:

Steven Colyer said...

Hi Bee,

I like that last idea of yours. I think what you're asking for is a place where speculation in the form of theories that were empirically falsified have their own category? Why not jot off a letter or e-mail to arxiv?

As far as theories that don't work click here? I think that may be the best substitute until your idea is implemented.

I like Stefan's alternative very much. Will need a day or two to consider.

Also, you have worked here in what I call "New Europe", aka the US. What on your opinion is the biggest difference between American and European Academia? Because heck yeah we're a publish or perish country. At least since the 80's when first I became aware of it.

Steven Colyer said...

To further clarify my last point, the fact that American academia is strongly "publish or perish" combined with our extremist Positivist attitude more than likely accounts for the pressure to primarily list positive results, IMHO.

Georg said...

What
about the paper of Danelli and the
question of thin boards?
Georg

Zephir said...

The whole problem is, science is not a hobby of Faraday or Tesla, who sponsored their research for their own money - it's normal industry - just without practical applications convertible into money. So that the scientific community postulated their own criterion of effectiveness for being able to value jobs of individuals on per month basis.

And because this community is arbiter of its own criterions of effectiveness, then it's evident, it's criterion will be always biased and shifted toward this community in the same way, like the laws generated by politicians will always favor the politicians - not the people, which these laws are supposed to serve for...

Zephir said...

BTW From the same reason most of particles observed inside of our Universe by us (finite objects with positive curvature) are of positive curvature, i.e. not antiparticles. This sociological problem therefore has its straightforward geometric formulation: "Simillia simillibus observentur".

It simply means, the criterions of effectiveness of scientific community should be adjusted outside of this community - despite of their members are thinking about judging of their work by laymans.

Zephir said...

/*..no effect, nothing new, Lorentz-invariance still alive..*/

This situation corresponds the situation of observer, who is testing constant speed of light during fall into black hole.

So (s)he falls.. and falls.. and he becomes elongated and curved in curved space-time and all his space and time measures will remain curved accordingly - so that he will always believe, the light is spreading at the fixed speed through the space - even at the moment, when the light is revolving black hole in close circles together with observer - so it effectively stays at the place and when poor observer is evaporated into whole volume of black hole - so there is actually no one, who could prove his invariant truth...

This is an analogy of people, adhering to their pet theories without consideration of the more general perspective. The Universe doesn't care about Lorentz symmetry at all - we do.

Rastus Odinga Odinga said...

Fanelli is obviously biased, because he has produced positive results about his own hypothesis.

"Dünnbrettbohrer". Oh boy, I will treasure that!

I think an alternative definition of "positive" would be more useful: a positive result is one which tells the leading people what they want to hear. It's *very* clear that it is vastly easier to publish "positive" results in this sense, even if the paper is rather lousy. It would be useful if someone could document this in a quantitative way. Of course, what a theorist might call "negative" in this sense might be "positive" for an observational person. I would really like to see a history of the 1998 discovery of dark energy written from this point of view: even now there are plenty of theorists [particularly string theorists] who are still annoyed about this: it was very negative for them, but luckily it was even more positive for the astronomers. It would be even more entertaining if dark energy turns out not to be a cosmological constant....

Bee said...

Hi Rastus,

Yes, you're right of course, the point is that it's what they want to hear, which could be either a negative or a positive result in Fanelli's sense. The problem is "what they want to hear" is even less knowable and thus harder to test for than my suggestion. I was thinking that "found something" would in most cases make a good proxy for "what they want to hear" because it stirs interest and is likely to justify more work (papers/ experiments/ conferences, ect). Best,

B.

Bee said...

Hi Steven,

In my opinion the biggest difference between (North) American and European academia is that the Americans work over hours, on the weekend, hardly ever take vacation - and actually seem to be proud of being busy, busy, crazy busy. For most Europeans that's nothing to be proud of, that's just stupid. I also don't have the impression it makes much of a difference for the actual output. Sweden is somewhat of an extreme example for that. Pretty much the whole country is in vacation in Juli. You want anything to get done that month, forget about it. There's a lot of funny anecdotes about it, including the minister who couldn't be reached when a nuclear power plant had to be shut down due to some (minor) failure etc.

In any case, the publish or perish culture is, in my impression, quite international. It has more to do with the community than with the national cultures. There's some founding agencies who are taking small first steps to not further encourage an increase in the quantity rather than quality of output, see eg my post "Publication cut-off", but for this to really have an impact it needs to be adapted pretty much internationally. It is good to see though that these question are being discussed a lot in the last few years. Best,

B.

Phil Warnell said...
This comment has been removed by the author.
Phil Warnell said...

Hi Bee,

I think the whole concept of a result being either positive or negative in the context used is totally subjective and thus renders Fanelli’s whole hypothesis meaningless, rather than wrong. I would say basically what’s forgotten is what in essence the scientific method represents being, as to suggest it not well understood by many, even some scientists. This method was first to be generalized by Francis Bacon when he said “In this philosophy particular propositions are inferred from the phænomena, and afterwards rendered general by induction.”

What is not well understood being what “inferred” stands to represent respective of the whole process, as being the essential deductive part; with the first rule of deduction being that the validity of a deductive statement is reliant on both the soundness and relevance of the premise(s) on which it is based. So for instance if we were to consider Bell’s theorem, either before or after being tested experimentally, what would represent as being the premise as to suggest what outcome as to be either a positive or negative result. The thing being there is no such thing to be discovered, rather only something that will be decided, not as being positive or negative, right or wrong, yet what represents as being how nature behaves, being only a truth respective of what stands as being a quality of reality, rather than a judgment of it.

“But if there be any man who, not content to rest in and use the knowledge which has already been discovered, aspires to penetrate further; to overcome, not an adversary in argument, but nature in action; to seek, not pretty and probable conjectures, but certain and demonstrable knowledge — I invite all such to join themselves, as true sons of knowledge, with me, that passing by the outer courts of nature, which numbers have trodden, we may find a way at length into her inner chambers.”

-Francis Bacon-Novum Organum (New Instrument)[1620]

Best,

Phil

Bee said...

Hi Phil,

I would say the question is how do you make the concept of "positive" or "negative" research results into one that is useful? I think the concept Fanelli has used is, in principle, not what he's looking for (since, as he points out himself, you only have to reformulate a sentence to turn negative into positive according to his definition). The one that should actually be used is what Rastus said: is the result what the reader wants to hear? As I remarked above, I would guess that in most cases scientists want to hear a result that makes their field interesting and opens doors to more research, that's why I suggested to better use "found something" instead of "confirmed hypothesis" as "positive," because I suspect that this is close to "what people want to hear." I further suspect that it is still reasonably close to Fanelli's definition of "positive" and "negative" simply because there likely is a shared idea, rather than a well-defined concept, of what is "positive" and "negative." That, I think, also makes it unlikely many people just reformulate their hypothesis to turn a "negative" into a "positive" result, because it would be obvious it's a cosmetic trick, and therefore Fanelli's criterion is probably close to both Rastus' and mine.

But one would have to check this of course... Which, I think, wouldn't be so complicated actually. You'd just have to take a set of these papers that are negative or positive according to Fanelli's criterion, figure out if they are negative or positive according to my criterion (I'd expect a large overlap). Then you'd have to reformulate some of the positive ones into negative ones and vice versa, and do a survey and ask people who are presented with the result if they would consider it positive or negative in a colloquial sense. If what I said above is right, you'd find that irrespective of the reformulation of a negative into a positive result or vice versa, there should be a large consensus on whether the result is "positive" (what they want to hear) or "negative," and that consensus is close again to what the authors originally wrote.

Best,

B.

T. said...

The bias for positive results is a complete shame -future scientists will laugh at us..

Here are two journals that publish null results, the second being more established than the first.

http://www.jasnh.com/
http://www.jnrbm.com/

Bee said...

Hi T,

Thanks. I know there's a few journals of that sort, mostly in biology and ecology I think. However, I think the problem is only partly one of publication. The maybe more important part is the appreciation of a scientist's peers. And you can bet they will appreciate results more if they like them. Makes me wonder if there's any good options to counteract this, all to human, tendency of wanting to please? It would seem to me it needs an actual incentive to produce negative results. Or maybe, it's just a matter of education, in that scientists should be taught early in their career that healthy science needs negative results as much as positive ones. Best,

B.

Uncle Al said...

A US grant funding proposal is a business plan: Budget, personnel, spreadsheet, PERT chart, zero-risk predicted results. Dünnbrettbohrer then translates as "least publishable bit." The world belongs to bottle washers and button sorters. Guaranteed failure is more fundable than the risk of success.

Senior faculty are safely gorged with cash because they guarantee late-career mediocrity. Young faculty are starved for funding (receiving some 10% of dispersed funds) because there is no telling what spikes they will generate. All discovery is insubordination.

Robert L. Oldershaw said...

If one is scientifically objective,
then one is neutral, and does not preferentially attract excess positive or negative bits, like so many styrofoam peanuts.
(;-)

Zephir said...

The ignorance of negative results brings nearly humorous aspects into contemporary physics - for example physicists are searching for gravitational waves obstinately, while purportedly ignoring, or even filtering out CMB noise. Or they're looking for extradimensions while ignoring, or even filtering out the influence of dipole and Casimir forces from their experiments... The search for cosmic strings is the misunderstanding of the same category, as I explained before.

http://www.physorg.com/news/2010-10-gravity-extra-dimensions-microscopic.html

http://physicsworld.com/cws/article/news/17025

Actually such approach plays well with the paradigm: the results or even understanding is not so important - the research and continuation of grants and safe job is. The alchemists of medieval era searched for philosopher stone in similar way. In this way, the mainstream physics helps the artificial employment of labour.

Plato said...

Francis Bacon was literally "two faced?":)

Best,

Plato said...

Uncle Al:All discovery is insubordination.

Given their pressures of the environment by which justice and law must abide by, truth by scientific investigation, which must be pushed forward, why not "split your personality" so that the creative side will not falter under the constraints of production?

An uncle to all, for the hopefuls and inspired , who work well under the creative exploration of advice and work well under pressure with such a release

A written play perhaps, or some constructive story in the perfect world?:)

I know what I am talking about as a layman:)As a Shakespeare perhaps?:)

Best,

Plato said...

Publish or Perish?

Is this so?

Or does one have an angle of perspective that had not be accounted before while considering the dilation field?

Fermi calorimetric measures are of a "configuration space?" While lensing, seems distorted across the universe?:)

Best,

Zephir said...

There is an old paper by Luigi Foschini about the problem of interpretation of quantum physics which has raised a discussion about the effectiveness of science and its limits:

Is Science going through a critical stage? (PDF)

Phil Warnell said...
This comment has been removed by the author.
Phil Warnell said...

Hi Bee,

I would agree your definition would better serve as being the thing that should be attempted to be discovered, as if it can itself become a premise, rather than to suggest having it taken as one. In essence this relates to how J.S. Bell considered how most if not all such propositions should be framed; not as being what they are, yet rather what they could be, thus his creation of what he called a “beable”. The thing is Bell was to recognize as to understand, that biases were something that science were not immune, as it being a human endeavour and thus as much as vulnerable to the weaknesses of humanity as it is to its strengths.

However in the end this whole exercise is primarily concerned with how we might give quality metrics, as to have it simply be identified, rather than find ways as to how it can be promoted as to become the goal. Ultimately the way I see such things as them not being a matter of metrics, yet rather relates more to philosophy, as first if it being the correct one and how well one holds to it after be found that it can be trusted, rather than simply something that is to be believed in.

“The difference between a good mechanic and a bad one, like the difference between a good mathematician and a bad one, is precisely this ability to select the good facts from the bad ones on the basis of quality. He has to care! This is an ability about which normal traditional scientific method has nothing to say. It's long pasttime to take a closer look at this qualitative preselection of facts which has seemed so scrupulously ignored by those who make so much of these facts after they are "observed." I think that it will be found that a formal acknowledgment of the role of Quality in the scientific process doesn't destroy the empirical vision at all. It expands it, strengthens it and brings it far closer to actual scientific practice.”
-Robert M. Pirsig- Zen and the Art of Motorcycle Maintenance - page 253

Best,

Phil

Phil Warnell said...
This comment has been removed by the author.
Phil Warnell said...

Hi Bee,

“Stefan used the, unfortunately untranslatable, German expression "Dünnbrettbohrer," which means literally "thin board driller.”

This thing in respect to the German language, bordering on being an obsession, to have a specific word for everything, rather than to risk leaving something ill defined has always fascinated me. That is as opposed to English, where there are many words with each having several meaning, as to have the language more contextual to have things to be understood.

In thinking about this and the phrase Stefan used the "Dünnbrettbohrer," which means literally "thin board driller”, I was thinking that it’s not the abundance of such people in the discipline being the problem, yet rather the lack of those that are essential to put things all together, as to have them in combination have greater meaning and utility; such as Newton served as being in respect to all the previous results that were gathered for centuries or arguably for millennia.

So then I would contend the problem is not with having too many "Dünnbrettbohrer" yet not enough dünnenbrettklebemaschinen:-) To carry the analogy further, when you glue many thin boards together ( and at different angles) you get something much stronger and useful, with that being plywood or simply a composite material, having qualities that exceed those as being simply that as the sum of the parts.

Best,

Phil

Bee said...

Hi Phil,

A nice play on words, though I would argue that we have plenty of thick boards lying around. What is needed then is a "Schlagbohrer," and for that my dictionary does have a translation: impact drill. Best,

B.

Phil Warnell said...

Hi Bee,

You said: “What is needed then is a "Schlagbohrer," and for that my dictionary does have a translation: impact drill.”

Do you mean Schlagbohrer or rather , Abbruchhammer as I’m not sure that you feel something is required to get things through their thick skulls or rather something they have to swallow as to get the point . In either case it’s well known that the best are made in Germany:-)

Best,

Phil

Eric said...

I agree that it is a worthy goal to have a community in which "unbiased" refutation is as important as verification. The problem seems to be that those who have had their ideas or constructs verified then become the judges of those ideas that will supersede them. There is then a very biased refutation process that occurs.

There have been countless examples in physics. One of the best known is Bohr's refutation of the EPR idea of correspondence at a distance. The idea was successfully buried for thirty Years until Bell came along. And the only reason he got involved was because it was essentially a side interest of his. I think if his main remunerative work lay in this field it can safely be said it would have been squelched by thIs "biased"refutation. It would have been a threat to his career path.

A more recent example is where Richard Feynman, with the help of Wheeler, et al, popularized the notion of an unchanging and vast sea of energy that for some reason we just can't see or experience. This idea was so popular that the fact that it does not make any sense with what we actually observe gained no traction. They made the Incorrect assumption that even though the universe is expanding we could just ignore this technicality as far as the mathematics goes. Any ideas to the contrary, such as a changing Planck scale were then put in the category of pseudoscience to shut the critics up.

So the problem isn't so much that there is not a problem of refutation but that all the authority for it is given to the God-like creatures who have made big contributions to science in the past. It seems to me we ignore changing this system at our peril.
Erpopularized the idea of a sat


A more relevant and up to date example is where