Sunday, January 20, 2008

Growing Mountains

In previous posts I mentioned repeatedly how essential it is that information is ordered, structured and filtered in a sensible way: badly ordered information is no information. The passing on of information from one generation to the next in a useful way is crucial to progress. Unfortunately, it seems to me the importance of this aspect and the impact of the consequences is not yet appropriately acknowledged. Instead, we are drowning in a sea of information, and social tagging doesn't even remotely address the problem. (Neither do blogs for that matter.) For some aspects of the question, see e.g. The Spirits that we Called, Communication, and The Right not to Know. In other places, I mentioned the problem of increasing specialization in our communities which makes the communication between areas, and consequently the information exchange harder, see e.g. Science and Democracy II or The Marketplace of Ideas.

I this regard, I recently came across an article by Vannevar Bush. He writes
    "Science has provided the swiftest communication between individuals; it has provided a record of ideas and has enabled man to manipulate and to make extracts from that record so that knowledge evolves and endures throughout the life of a race rather than that of an individual.

    There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers—conclusions which he cannot find time to grasp, much less to remember, as they appear. Yet specialization becomes increasingly necessary for progress, and the effort to bridge between disciplines is correspondingly superficial.

    Professionally our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose."

He goes then on praising recent technological developments...
    "Adding is only one operation. To perform arithmetical computation involves also subtraction, multiplication, and division, and in addition some method for temporary storage of results, removal from storage for further manipulation, and recording of final results by printing. Machines for these purposes are now of two types: keyboard machines for accounting and the like, manually controlled for the insertion of data, and usually automatically controlled as far as the sequence of operations is concerned; and punched-card machines in which separate operations are usually delegated to a series of machines, and the cards then transferred bodily from one to another. Both forms are very useful; but as far as complex computations are concerned, both are still in embryo."

The above quotes are from the article As We May Think, which was published in Atlantic Monthly, July 1945! It is worth reading the full text. Bush basically foresaw social tagging
    "When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path.

    The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain."


What is more interesting however is that he suggests to remember not only the associative keywords, but to store the paths people have taken to arrive at a certain piece of information.

Now I am wondering whether this suggestion would be worthwhile to try for navigation in the web. To begin with, I would sometimes be grateful to find the paths that I myself have taken before. (Never delete the browser history. Curse those who set A:visited = A:link ). Consider one could visualize the website you are viewing on a map with other people's paths going in and out, some major roads, some smaller sideways. I could imagine this to be useful to help with the keyword problem I occasionally encounter: what do you do with a search engine if you can't find the right keywords? Well, you guess something that might maybe come into the direction you hope for. It might be a bad guess though. E.g. if I start this way on the arxiv, I subsequently use a couple 'refers to' links, upon which one sooner or later always finds the relevant publications - rivers running to the sea. Now imagine you could instead just select among the paths others have taken - all these people must be good for something, do we have to repeat such path finding over and over again?

Just some Sunday afternoon random thoughts.

27 comments:

rillian said...

Should be "sea of information".

Bee said...

thanks :-)

stefan said...

Dear Bee,

thank you for digging out that old paper - that's really fascinating!

Hmm, about tracing the trails to data, sounds interesting, it may help if you are are not so sure about good keywords. But on the other hand, when looking for reliable information about a bit more obscure stuff along rarely used paths (like Newton's theological views, or the role of GR in spacecraft trajectory integration, as I did today for the Messenger thread...), it's perhaps less useful?

Best, Stefan

Bee said...

Dear Stefan:

I have to admit I didn't dig it out but took the reference out of another article that however focused on a different aspect.

Yes, I agree, the method would be useful only in cases where one could distinguish the more-taken paths. I.e. if statistics is too low, or the situation is too unclear, all paths will look more or less equal. Still, it might be useful for the following reason. The internet is mostly a scale-free network. High traffic websites rarely ever link downwards, which further supports their hub-like status. This is useful in some sense, because it has this aspect I mentioned of 'rivers flowing to the sea' - a lot of smaller websites (e.g. blogs) referring to the larger ones. The problem is however, that you don't have cross-correlations. Like, picture the origin of the paths leading to a certain article, how do you get from one of them to the other? Trackbacks do a part of this, but they go only one step, plus they don't always work, not all websites actually show them, and it depends on the website whether they appear, whereas I was thinking of a browser feature that does not depend on the website's actual content. This could help to de-centralize the network structure. I.e. consider such a feature was available. Now take away the central hub, and you'd still know which sites referred to it, used, or discussed its content.

Best,

B.

Bee said...

Oha - I just noticed the new PI websites are online, check this out!

Geo said...

I find it helpful to distinguish between data and information. It takes work, such as adding structure, context or interpretation, to convert data into information.
Thus, the problem is rather that "we are drowning in a sea of" data, not information.

Thanks for the blogs!

Bee said...

Hi Geo:

Yes, that is true. Sorry if I am somewhat sloppy in my writing, I am still trying to get into the topic. I would have said: data needs a decoding system to be of use - I have touched that topic in the above mentioned post about communication. It doesn't matter what you say, if nobody can (or doesn't want to) extract its meaning it is useless. One should not underestimate the importance of that aspect, especially with regard to the specialization problem (makes decoding increasingly difficult), so thanks for the comment.
Best,

B.

amaragraps said...

Dear Bee, If you are reading up on Vannevar Bush, then here are some references that I think would be useful for you to have. These refer to Vannevar Bush's large influence on the first hypertext project: Xanadu

Ted Nelson / Xanadu

Classic Computer Lib / Dream Machines by Ted Nelson partially reproduced

About Ted Nelson

XanaduSpace .. the software continues

Nelson's Transliterary Standard

Wikipedia's entry on Xanadu: An uncomplete story of the Xanadu Project since the 60s.

Some of the original Xanadu folks correcting Wikipedia's many encyclopedia Xanadu mistakes: here and here.

Some of the historical Xanadu papers are being archived.

amaragraps said...

That's odd.. my last three links were converted to something unworkable. Here they are again, not embedding them.

Some of the original Xanadu folks correcting Wikipedia's mistakes:
http://lists.extropy.org/pipermail/extropy-chat/2007-August/037191.html
http://lists.extropy.org/pipermail/extropy-chat/2007-August/037192.html

Some of the historical Xanadu papers are being archived:
http://lists.extropy.org/pipermail/extropy-chat/2007-August/037195.html

amaragraps said...

And the first link, unembedded is:
Ted Nelson / Xanadu
http://escience.anu.edu.au/lecture/comp1710/introduction/history5.en.html

Christine said...

The new PI site looks very nice.

Interesting idea to have a "map of paths" to a given information. Quite often we get to some information starting from a completely different site. So a program could be made to trace possible paths to a given information "under some constraint" given by the user. Otherwise the map could easily end up to be too large and useless as well (one more "sea of information" to be dealt with...).

Best,
Christine

amaragraps said...

The Xanadu ideas incorporated bi-directional links and much more. The original Xanadu crew are well aware of the crippled nature of the Web today. If you couldn't manage through my munged links (I still don't know why that happened), then here is another, I hope good reference by John Walker that describes a subset of the the ideas, that he is calling Hack Links.

Another aspect of the Web that is missing, is the way to seriously critique the contents of a web page, in a way to make it a dialog to all readers of that page. Foresight Institute implemented that, called their Web Enhancement Project, which became eventually Crit.org. I saw many demos in the middle-1990s (friends of mine were working on that too), which were very cool, and I wish were used widely and seriously today

So you see Vannevar Bush's ideas went pretty far, even if many people still have not heard of them.

Uncle Al said...

The small infinity is the countable number of integers. An infinitely larger infinity is the number of points on a line. An infinitely larger infinity than that is the number of functions through a point.

You will need a lot of storage for your paths unless you embrace minimum action paths achieved by summing over all paths. Then, you are damned: Efficiency destroys serendipity.

Copper wire, microwave plumbing, fiberoptic... the fun part is the (evanescent) field not the routing conduit.

Kaleberg said...

When I was a student at MIT I was most impressed by Vannevar Bush. Let's face it, he predicted point of sale terminals and iTunes coverflow. If you read that article, you can see an awful lot of what he proposed has come into use, and there are still al ot of great ideas there.

I actually got to use one of the products of his research when I was doing some research. I needed to go through a fair number of journals, many only available on microfiche. The librarian led me to a comfortable arm chair, slipped the fiche into a box and showed me how to use the elecromagnetic controller so I could quickly step through or randomly access any page on any fiche, with the image projected on a large screen in front of me. I felt like a Jules Verne character leaning back on my anti-macassar and smoking a fine cigar as my ship raced to the moon.

Bush's idea of tracing your search paths is still excellent. I actually use a utility program to record the URLs and page titles of every internet page I view. There are hundreds of thousands over the last four years. I still find myself using the search facilities, by date, by keyword, to figure out how I found some item I frequently use or to find something I have half forgotten.

I would love to see this incorporated into my operating system. That way, when I pull together a paper or project, I can see where things came from. Where did I pull that image or chart from; where did I get those numbers; which items did I look at and reject? There are various pieces of this, URL trackers and file reference trackers, but they are limited and not particularly well integrated.

As We May Think is still an astounding article and chock full of good ideas. As I tell people looking for great new ideas, look in the stacks at the MIT library. Some of those old ideas are still quite novel.

(A good example was the Minsky's confocal microscope. He built one for his PhD; the next was made more than 20 years later and it is now a laboratory workhorse).

Tkk said...

Bee,
Your 'random thought' actually touches on a most profound and challenging topic.

As you might know, the idea behind the web, cross-linked dynamic presentation, was invented by a physicist at CERN as part of his research into solving the problem of how to present the immense amount of data from collider experiments. The solution is so innovative and general that it can be extended to the world at-large using Internet technologies - the web was born.

The idea of 'linked and co-related embedded paths and information' is not new to web software researchers and designers. They called it the semantic web and in fact has done a great deal of work on it for some 10 years. A couple of prototypes have been done. Also, a prototype browser which displays relevant web paths based on semantics was released and I even tried it.

For example: I would click on 'Backreaction, Growing Mountains' and the semantic browser will return with a map of all articles that originate, related to, and supported by it. I see a map (which can be presented in many ways) related to this information sorted by or filtered by whatever I want. No web addresses are shown unless I drill down. Further, the map of information can be extracted by another piece of software, process, and organized into a research paper with a click. Amazing.

So far, these remain in research. There are two basic problems: a) The software language and technologies needed to do a commercial-scale semantic web proved to be immensely complex and challenging. (No surprise - anything that mimics the human mind are like that.) Both the server side and browser (client) side are involved, each requiring completely new software and web design expertise. b) The economic case is bad. It is hard even for VC to identify where the money will come from even for a simple semantic web. VC instead has no problem identify another type of web where a business case is much clearer and you're looking at it - blog, social web, peer-to-peer sharing web. These can be implemented with relatively easy extension of current web language and technologies.

Perhaps a young physicist working on the LHC a few years from now will no longer accept the frustration of handling so much new information - and invent a true semantic web solution!

QUASAR9 said...

Having only just seen Redator versus Alien, or was it Alien versus Predator - quite amazing the notion that the perfect hunter would create the 'perfect' prey

And ultimately the prey (alien) manages to take over the hunter to become a super predator, the hunter becoming the prey?

The Universe is a funny place, and the imagination stranger still. I love those predator suits, and I did that spaceship. But then again this is hollywood, where poetic licence can be applied even to the laws of physics, and maths is still king - wearing $dollar signs

Bee said...

Dear Amara:

Thanks so much for the links. In fact, it was via several detours a comment of yours that lead me to the article above. I find these topics very interesting, though my understanding is somewhat peripheral.

On a meta-level this comment section is actually a very nice example of information collection that is incredibly hard to achieve via an algorithm: very often it happens to me that I have a rather vague idea of what I am looking for. Then I go into a colleagues office, wave around with my hands wildly for 10 minutes, and if everything works fine they will produce some keywords for me. These I can then look up in a database. So the keyword problem is kind of an open question to me, especially in areas that are not primarily language oriented. E.g. think about mathematics. One might want to search for a certain type of equation, but what if you don't know what it is called? Or one might wonder whether somebody has examined the problem that arises if one does so and then so and then so before, but what if you don't know the name the problem got known under?

In regard to what you said in an earlier comment, there is the danger that increasingly people seem to think if they can't find it on Google it doesn't exist. I guess you too do receive an increasing amount of emails from people who send their 'new' solutions to physics problems etc. It's not as if they are just all crazy. In many cases they just don't seem to be aware what work has already been done on the subject, or that the problem has actually been solved 30 years ago etc. If one doesn't sit in an institute, or in an department where people can come up with these keywords, or references one gets lost easily. It is not a problem that search engines can trivially solve, the human brain is in this regard still much more useful. They are sometimes annoying these senior colleagues sitting in seminars saying nothing but 'There was a paper in 197x by Y, which showed Z', but sometimes I wonder what if that information just gets lost - will we reinvent the wheel every 30 years?

So what does this comment section do? I write a vague text, and some of the readers like you, Kaleberg and Tkk link this to further information on the topic. This is one of the aspects that I think blogs can be good for, collecting information or associations to a topic that's not easily key-wordable through many people's contributions. (Though this comment section is a very good example. Unfortunately, in most cases people contribute with links and references that have nothing to do with the topic, noise in which the signal drowns). Best,

B.

Bee said...

Hi Kaleberg,

Who is the Vannevar Bush of today?

Best,

B.

Bee said...

Hi Tkk:

Thanks for your interesting comment! Actually, while writing the post above I was at least 99.9% sure there is already work in this direction. I understand the economical problems. I had been hoping it would be possible to do something like this as a browser addition that could be shareware. A further question to this point: are there any studies on how it could help to improve the information sharing and ordering, on how it would help people cope with the information overflow (especially not being able to find information AGAIN one knows exists I find extremely annoying), in how it could potentially improve knowledge growth in scientific communities through more efficient information processing. I am not talking about studies on the marketing potential, but studies on sociological/political impact? Best,

B.

Phil Warnell said...

Hi Bee,

Interesting post as is usual. Yours and others thoughts on how to organize sort and find relevant information are all part of what must be done. There is one aspect of this that I feel has be overlooked and that being that the primary problem is that the overwhelming proportion of what is on the Web is not information at all, yet rather what I would call babble. The internet is something that has evolved from an entity created to store and share mostly academic material to become the fridge door of the world. Much of what is stored in all the interlinked computers is nothing more then idle gossip personal, records, photos, video file, audio files and alike. The value of this beyond the individuals directly involved and perhaps the connected few is zero; except perhaps for the marketers and identity thieves of the world. I believe the vital first step is to create a second Web from which all such content is excluded. Then we can think of better ways to find sort and access what would be relevant. As is the first rule of nature, the value of economy shouldn’t be ignored.

Regards,

Phil

RKP said...

As one of the readers said "I actually use a utility program to record the URLs and page titles of every internet page I view...", one might be interested in checking out a web-utility that we've developed, hooeey (www.hooeey.com) , which is a toolbar that records your web hops. The entire hooeey application is based on the idea that browser history is useful and can be re-purposed. Hope this helps.

amaragraps said...

Phil Warnell:
The primary problem is that the overwhelming proportion of what is on the Web is not information at all, yet rather what I would call babble.

However, isn't the Web just a snapshot of the Real World? Do you think that Second Life is different? I think that humans just carry their Babble with them to whatever is their favorite communication medium at the moment.

Bee:
On a meta-level this comment section is actually a very nice example of information collection that is incredibly hard to achieve via an algorithm:.

Dear Bee: Yes! I view the commenting portion of blogs as a diffuse feedback mechanism to Web information, so even though it is not the most efficient bi-directional mechanism, it _is_ a bi-directional mechanism, and moreover, it fits with the high pattern-matching cabilities of the human mind. The Web would be a considerably poorer place without blogs and their commenters... I guess that Vannevar Bush would be pleased about it, as well. :-)

Phil Warnell said...

Hi Amara,
“However, isn't the Web just a snapshot of the Real World? Do you think that Second Life is different? I think that humans just carry their Babble with them to whatever is their favorite communication medium at the moment.”
I agree that for the most part the Web simply has become an extension (sometimes substitution) of the important social aspect of our species and I’m not proposing it be dismantled as I firmly believe it serves as a uniting force. What I was referring to was how one would begin to separate the relevant from irrelevant. What do you think would happen to our libraries if every citizen felt it was important they store, maintain, categorize and cross reference all their personal thoughts, letters, snapshots, preferences, home movies etc.? Do you believe that it would serve to increase their value or utility for those who depend on them for study, research or personal growth? Band width and capacity issues are already becoming serious concerns from both the technical and economic perspectives.

I am one that can remember the times when you would have to pre-compose correspondence, pre-plan searches and queries when using CompusServe since the clock was ticking as far as money was concerned from the moment one logged on. This was not simply because they wanted to make things inconvenient or more costly yet also a method necessary to manage the resource. No I’m afraid the human race has proven its ability and propensity to waste and litter regardless if we are talking about, shopping bags, energy or information. I do think that a second Web needs to be created where we strive to maintain what is relevant and of true value.

Regards,

Phil

amaragraps said...

Dear Phil, Bandwidth and the state of technology in computer hardware has _always_ been an issue in what humans desire in their communication! In the middle sixties, I played the game Wampus over a 300 baud acoustic modem with my 8 year old sister when the mainframe computer wasn't being used to compute sailboat race results. Of course I would have liked our game to be more interactive! Twenty years later, when I bought my first personal computer, my friendly correspondences were, as before, limited by my telecommunication technology. My 14.4K modem was certainly a bottleneck through Usenet and the first advent of the World Wide Web. Usenet, one of the first large human Babbles, and the WWW presented the exact same issues that you are raising today. People devised their own filtering and searching strategies through the Babble then, as they do today, through the blogs. And the next step? Some don't have the computer power and bandwidth for Second Life, but some do, as before, when some didn't have the computer power and bandwidth for the 1980s Usenet or my 1960s Wampus!

Plato said...

Data mining is about finding patterns in massive amounts of raw information.

Who knows what that noise can present? Often without a "comprehensive view of the data" what use the data without the view of it's writer? :)

So the writers leave us a flavour of what they are thinking, behind what is written? Are we perceptive enough to know the "key words" that would bring the view of the writer together?

With no plan insight, how is it one could then deduce where to go and being left with the impressions received, we form a plan?

While it is never clear as one begins to write, the "solidication of insight" somehow makes it way through, and we are left with this pattern recognition, by drawing together this "ultimate posting?"

So one searches for the right words in translation. Maybe while sitting in the lectures one doodles? Draws pictures at the back of a person's head, and what does this lead us too?

For some it is always apparent, but for others no attachment? :) The second level is already here, and has always been there. The explorative mind asks that use your knowledge to "look through" what is being written?

Then you get the sense of the person, not some noise. Get some sense of the pattern. If you had some geometric object to look a,t would it suffice for what was apparent through Garrett Lisi's work?

Maybe "tree patterns on the landscape while being viewed from above" or, some sexual network unfolding? Maps of the internet revealed as routes of some neurological function of the mind.

Phil Warnell said...

Hi Amara,

Well I guess it boils down to this. I thought that when having trouble in finding the needles in the haystack that we not throw them there. On the other hand you believe it more beneficial that we do and let technology compensate for it. It is true that by doing so it will improve our technical ability for searching for needles. It’s just that I believe the time and effort would be better spent on something that we could not avoid.

Regards,

Phil

stefan said...

While I was looking for something else, I just stumbeled upon this article about Vannevar Bush and some precursors to him: Before Memex: Robert Hooke, John Locke, and Vannevar Bush on External Memory by Richard Yeo (full text for free). There are a few illustrations of Bush's Memex machine in the paper.

Best, Stefan