Thursday, March 31, 2005

March 2005 Early Indications II: Information Gumbo

The world of data has entered a notably rich period of evolution. Search technologists at a variety of startups and deep-pocketed incumbents are engaged in an arms race, with new tools and capabilities appearing almost weekly. (Examples include A9's Open Search, Ziggs, Picasa, Browster, Oodle, and EVDB.) RSS is expanding beyond news and blog feeds. Tagging and other bottom-up classification methods, including wikis, are growing at a phenomenal rate.

Why do these matter? Taken in the aggregate, they reflect a new set of assumptions about people and what they do with information. Depending on how things unfold, we might get much closer to wide usability than the hard-coded obtuseness of a relational database or enterprise application typically allows. Rather than having to know some arbitrarily defined, precise syntax to get from A to B, for example, people can both name and define something themselves and then trust new search and display techniques to learn what they need to know.

In no logical order, here are some areas of innovation:

1) Data and applications can now interact in new ways. In public examples such as Google Maps and Gmail's spellchecker and lookahead address book, it's easy to see that browser windows can deliver surprisingly rich functionality without plug-ins. The buzzword to describe this is Ajax: Asynchronous JavaScript and XML. I won't dive into technical explanations here, but suffice it to say that Ajax avoids the lag associated with an HTTP call back to the server across an unpredictable network. Instead, Ajax embeds scripting that allows the browser to change what the viewer sees - instantaneously. One of many relevant outcomes: dynamic visual representations of information (like Musicplasma) become more feasible. (For more, see this explanation.)

2) Information we look for and information we want to find us generally differ in size, timeliness, and need for context. The need for so-called "glanceable" information drove Microsoft's SPOT watch initiative, and the same kind of information is also now available in a more elegant fashion on glowing cubes and eggs from Ambient Devices: the orb's color indicates the overall health of a stock market or portfolio, with green being healthy and red being dangerous.

The weather cube works exactly like the old John Hancock building spire in Boston on which glowing blue means a nice day in the forecast; by contrast, flashing red means snow.(1) Glanceable information like the time, temperatures, or sports scores needs little context, whereas what might be called "intentional" information - things one looks for - usually requires some scaffolding for it to be meaningful: which analyst rated the stock a "buy"? What's her track record? What's today's stock price? How does that stack up with yesterday or a year ago, or with the sector generally? With a nod to Les McCann and Eddie Harris, this is the "compared to what?" issue.

Once that context is in place, new categories can grow more amenable to glanceable representation. For example, once we get accustomed to a given news or opinion source, it can be nice to have it pushed to a newsreader with RSS so when I ask "what did blogger X have to say about the State of the Union Address?" I can pluck the entry out of a list rather than mount a more traditional surf or search. Thus intentionality and glanceability are unstable and personalized categories of information. It's very early, but RSS is evolving into an enterprise tool with uses beyond automating the distribution of corporate communications. Consider sales forces: RSS can be used to push price changes out to Blackberries in the field, or to aggregate inbound orders and lead reports into a format far easier to manage than faxes or e-mail. More convenient access can make formerly cumbersome query data glanceable: the technology can change the usefulness and ease of integration of the information.

3) Traditionally, organizing information has been a top-down affair; we've previously discussed the Library of Congress cataloguing scheme as an example. Another source of context for data can come from the bottom up as people who know something about that datum under discussion contribute what they know. The Flickr photo service provides one example, Wikipedia another. Yet another current buzzword - folksonomies - differentiates bottom-up from top-down information architectures.

The open-source model shows that groups can in fact be organized, and self-organize, to do amazingly large amounts of work on an ad hoc and often volunteer basis. It's also worth asking, however, how voluntarism translates to commercialization: the Gracenote database that helps make iTunes so easy to use began as a volunteer effort, but the early contributors received nothing from the commercial success of the company. What will happen with Wikipedia when the expenses and perhaps the profit potential of the effort outstrip the donation model? Servers and bandwidth aren't free even if the content is, so when might commercial apparatus like lawyers, bankers, and managers alter the project?

4) The iPod illustrates a complex information dynamic: sometimes what we want isn't amenable to formulation in a search string. As a friend pointed out, Apple brilliantly turned the iPod Shuffle's lack of a display into a feature, selling randomness as a benefit. Serendipity matters for many kinds of information: there are times when the next song in a randomized playlist is "right" for reasons the listener could not have specified beforehand, or the webpage you found while looking for something else can have a major impact.

Another class of information relates to things for which either there is no name (industrial parts known mainly through numbers which do not appear on the part itself, for example) or the name is unavailable to the person who needs to find the thing that bears the name. Here, folksonomies hold both promise and peril: right now the process by which a term becomes standardized is, in Google VP of Engineering Adam Bosworth's term, "sloppy." That's good, in that committees don't have to form for work to get done, but bad in that the sloppiness introduces the prospect of s-p-a-m and other externalities of an open, networked process.

Right now, the numbers and more important the culture of the wikipedia community are manageable, with rare exceptions like the trauma of the George W. Bush entry, which was constantly redacted by editors with opposing viewpoints. If I need to find the name of something before I can search for it, and the name is unnecessarily arbitrary and/or fluid, it's going to cause problems. There's also the question of scalability: is there a threshold of participation past which there are just too many chefs in the kitchen? At the same time, if search looks more for keywords and semantic context as opposed to precise textual or numerical matches a la SQL, the noise in a community-driven system will aid processor- and algorithm-intensive search engines in steering people to what they need - which may or may not be what they articulated in the search bar. The contrasting strengths and weaknesses of folksonomies and XML namespaces are probably educational.

All in all, it's hard to project where the co-evolution of wikis, tags, XML, search, and databases will lead. Google has shown that relational databases don't scale infinitely, but indexing and search just might. In the other corner of the heavyweight boxing ring, Yahoo's purchase of Flickr gives it access to new technologies and not accidentally a way of looking at the world that will certainly bear fruits in the future. On the client side, new technologies in cell phones have the potential to add location to the context equation, with huge implications for both privacy and relevance. Having end-user appliances that are simultaneously a sensor (whether fixed, like the A9 search history, or mobile) and an input/output device changes the game still further.

To a greater degree than in the past few years, the technology market's "buzzing, blooming confusion" (to crib from William James, himself no slouch in the human-use-of-information department) leaves room for some new entrants to make a potentially enormous impact. It's hard to imagine IBM, Oracle, or Microsoft creating the ''next big thing" from this emerging toolbox, precisely because they're accustomed to avoiding the very sloppiness that drove its invention in the first place.

(1) includes a mnemonic for the Boston weather spire with some relevant addenda:

Steady blue, clear view
Flashing blue, clouds due
Steady red, rain ahead
Flashing red, snow instead.

Remember, however, that in summer, a flashing red light means that the upcoming Red Sox game has been canceled. And if the lights flash blue and red simultaneously, as happened for the first time October, 2004 it means the Red Sox won the World Series.