Some weeks ago,
the tree octopus made headlines again. If you had never heard of this creature before, don’t worry,
it is an internet hoax used for classes on information literacy. It is easy enough to laugh about the naiveté of students believing in the tree octopus.
Or people believing in spaghetti trees for that matter. Scientists in particular are obliged to carefully check all facts they use in their arguments. But in reality, none of us can check all the facts all the time. A lot of what we know is based on trust and an ethereal skill called ‘common sense.’ We’re born trusting adults tell us the truth – about the binky fairy. Most of us grow up adding a healthy dose of skepticism to any new information, but we still rely heavily on trusted sources and the belief that few people are willfully evil. What happened to that in the age of the internet?
When I write a paper, I usually make an effort to check that the references I am citing do actually show what they claim, at least to some level. Sometimes, digging out the roots of a citation tree holds
spaghetti surprises. But especially when it comes to experiments, fact checking comes to a quick halt because it would simply take too much time putting under scrutiny each and everything. And then peer review has its shortcomings. In my daily news reading however I am far less careful. After all, I’m not being paid for it and I have better things to do than figuring out if every story I read (
Can you really get stuck on an airplane’s vacuum toilet?) is true. Most of the time it doesn’t actually matter because, you see, urban legends are entertaining even if not true. And, well, don’t flush while you s it.
I think of myself as a very average person, so I guess that most of you use similar recipes as I to roughly estimate a trust-value of some online recource. The rule of thumb that I use is based on two simple questions: 1) How much effort would one have to make to fake this piece of information in the present form, and 2) How evil would one have to be.
How much effort would one have to make to put up a website about a non-existing animal? Well, you have to invest the time to write the text, get a domain, and upload it. I.e. not so very much. How evil do you have to be? For the purpose of teaching internet literacy, somebody probably believed he was being good. Trust-value of the tree-octopus: Nil. How much effort do you have to make to fake some governmental website? Some. And it’s probably illegal too, so does require some evil. How much effort would you have to make to fake the moon landing?
Of course such truth-value estimates have large error-bars. Faking somebody else’s writing style for example can be quite difficult (if it wasn’t I’d be writing like Jonathan Franzen), but depends on that writing style to begin with. If you’ve never registered a domain before you might vastly overestimate the effort it takes. And how difficult is it really to convince some billion people the Earth is round? (
Well, almost.) Or to convince them some omniscient being is watching over them and taking note every time they think about somebody else’s underwear? There you go. (
And Bielefeld, btw, doesn’t exist either.)
The trustworthiness of Wikipedia is a question with more than academic value. For better or worse, Wikipedia has become a daily source of reference for hundreds of millions of people. Its credibility comes from its articles being scrutinized by millions of eyes. Yet, it is very difficult to know how many and which people did indeed check some piece of information, and how much they were influenced by the already existent entry. The English Wikipedia site thus, very reasonably, has a policy that information needs to have a source. Reasonable as that may sound, it has its shortcoming, a point that was made very well in
a recent NYT article by Noam Cohen who reports on a criticism by Achal Prabhala, an Indian advisor to the Wikimedia foundation.
There is arguably information about the real world that is not (yet?) to be found in any published sources. Think of something trivial like good places in your neighborhood to find blackberries (the fruit)
1. More interesting, Prabhala offered the example of a children’s game played in some parts of India, and its Wikipedia article in the local language, Malayalam. Though the game is known by about 40 millions of people, there is no peer reviewed publication on it. So what would have constituted a valid reference for the English version of the website? What counts as a trusted source? Do videos count? Do the authors of the Wikipedia article have to random sample and analyze sources with the same care as a scientific publication would require? It seems then, the information age necessitates some rethinking of what constitutes a trusted source other than published works. Prabhala says:
“If we don’t have a more generous and expansive citation policy, the current one will prove to be a massive roadblock that you literally can’t get past. There is a very finite amount of citable material, which means a very finite number of articles, and there will be no more.”
Stefan remarked dryly they could just add a reference to Ind. J. Anth. Cult. [in Malayalam], and nobody would raise an eyebrow. Among physicists this is, tongue-in-cheek, known as “
proof by reference to inaccessible literature” (typically to some obscure Russian journal in the early 1950s). The point is, asking for references is useless if nobody checks even the existence of these references. Most journals do now have software that checks reference lists for accuracy and at the same time for existence. The same software will inevitably spit out a warning if you’re trying to reference a
living review.
But to come back to Wikipedia: It strikes me as a philosophical conundrum, a reference work that insists on external references. Not only because some of these references may just not exist, but because with a continuously updated work, one can create circular references. Take as an example the paper
“Moisture induced electron traps and hysteresis in pentacene-based organic thin-film transistors” by Gong Gu and Michael G. Kane, Appl. Phys. Lett. 92, 053305 (2008). (Sounds seriously scientific, doesn’t it?) Reference [13] cites Wikipedia as a source on fluorescent lamps. There is
a paper published in J. Phys. B that cites Wikipedia as a source for the double-slit experiment, and
a PRL that cites the Wikipedia entry on the rainbow.
Taemin Kim Park found a total of 139 citations to Wikipedia in the fields of Physics and Astronomy in the Scopus database as of January 20112.
That citation of Wikipedia itself would not be a problem. But the vast majority of people who cite websites do not add the date on which they retrieved the site. More disturbingly,
the book “World Wide Mind” that I read recently, had a few “references” to essays by mentioning they can easily be found searching for [keywords], totally oblivious to the fact that the results of this search changes by the day,
depends on the person searching, and that websites move or vanish. (Proof by Google?)
While the risk for citation loops increases with frequently updated sources, it is not an entirely new phenomenon. A long practiced variant of the “proof by reference” is
citing one’s own “forthcoming paper” (quite common if page restrictions don’t allow further elaboration), but in this forthcoming paper - if it comes forth - one references the earlier paper. After ten or so self-referencing papers one claims the problem solved and anybody who searches for the answer will give up in frustration. (See also:
Proof by mutual reference.)
Maybe the Wikipedia entry on the octopus hoax is a hoax?
Take away message: References in the age of the internet are moving targets and tracing back citations can be tricky. Restricting oneselves to published works only leaves out a lot of information. Citation loops by referencing frequently updated websites can create alternate realities. But don’t worry, somewhere in the level 5 multiverse it’s as real as, say, the moon landing.
Have you cited or would you cite a Wikipedia article in a scientific publication? If you did, did you add a date?
1 And why isn't there a website where one can enter locations of fruit trees and bushes that nobody seems to harvest? Because where we live a lot of blackberries, cherries, plums, peas, and apples are just rotting away. It’s a shame, really.
2 From Park's paper, it is not clear how many of these articles citing Wikipedia were also about Wikipedia. The examples I mentioned were dug out by Stefan.