Thursday, March 31, 2005

March 2005 Early Indications II: Information Gumbo

The world of data has entered a notably rich period of evolution. Search technologists at a variety of startups and deep-pocketed incumbents are engaged in an arms race, with new tools and capabilities appearing almost weekly. (Examples include A9's Open Search, Ziggs, Picasa, Browster, Oodle, and EVDB.) RSS is expanding beyond news and blog feeds. Tagging and other bottom-up classification methods, including wikis, are growing at a phenomenal rate.

Why do these matter? Taken in the aggregate, they reflect a new set of assumptions about people and what they do with information. Depending on how things unfold, we might get much closer to wide usability than the hard-coded obtuseness of a relational database or enterprise application typically allows. Rather than having to know some arbitrarily defined, precise syntax to get from A to B, for example, people can both name and define something themselves and then trust new search and display techniques to learn what they need to know.

In no logical order, here are some areas of innovation:

1) Data and applications can now interact in new ways. In public examples such as Google Maps and Gmail's spellchecker and lookahead address book, it's easy to see that browser windows can deliver surprisingly rich functionality without plug-ins. The buzzword to describe this is Ajax: Asynchronous JavaScript and XML. I won't dive into technical explanations here, but suffice it to say that Ajax avoids the lag associated with an HTTP call back to the server across an unpredictable network. Instead, Ajax embeds scripting that allows the browser to change what the viewer sees - instantaneously. One of many relevant outcomes: dynamic visual representations of information (like Musicplasma) become more feasible. (For more, see this explanation.)

2) Information we look for and information we want to find us generally differ in size, timeliness, and need for context. The need for so-called "glanceable" information drove Microsoft's SPOT watch initiative, and the same kind of information is also now available in a more elegant fashion on glowing cubes and eggs from Ambient Devices: the orb's color indicates the overall health of a stock market or portfolio, with green being healthy and red being dangerous.

The weather cube works exactly like the old John Hancock building spire in Boston on which glowing blue means a nice day in the forecast; by contrast, flashing red means snow.(1) Glanceable information like the time, temperatures, or sports scores needs little context, whereas what might be called "intentional" information - things one looks for - usually requires some scaffolding for it to be meaningful: which analyst rated the stock a "buy"? What's her track record? What's today's stock price? How does that stack up with yesterday or a year ago, or with the sector generally? With a nod to Les McCann and Eddie Harris, this is the "compared to what?" issue.

Once that context is in place, new categories can grow more amenable to glanceable representation. For example, once we get accustomed to a given news or opinion source, it can be nice to have it pushed to a newsreader with RSS so when I ask "what did blogger X have to say about the State of the Union Address?" I can pluck the entry out of a list rather than mount a more traditional surf or search. Thus intentionality and glanceability are unstable and personalized categories of information. It's very early, but RSS is evolving into an enterprise tool with uses beyond automating the distribution of corporate communications. Consider sales forces: RSS can be used to push price changes out to Blackberries in the field, or to aggregate inbound orders and lead reports into a format far easier to manage than faxes or e-mail. More convenient access can make formerly cumbersome query data glanceable: the technology can change the usefulness and ease of integration of the information.

3) Traditionally, organizing information has been a top-down affair; we've previously discussed the Library of Congress cataloguing scheme as an example. Another source of context for data can come from the bottom up as people who know something about that datum under discussion contribute what they know. The Flickr photo service provides one example, Wikipedia another. Yet another current buzzword - folksonomies - differentiates bottom-up from top-down information architectures.

The open-source model shows that groups can in fact be organized, and self-organize, to do amazingly large amounts of work on an ad hoc and often volunteer basis. It's also worth asking, however, how voluntarism translates to commercialization: the Gracenote database that helps make iTunes so easy to use began as a volunteer effort, but the early contributors received nothing from the commercial success of the company. What will happen with Wikipedia when the expenses and perhaps the profit potential of the effort outstrip the donation model? Servers and bandwidth aren't free even if the content is, so when might commercial apparatus like lawyers, bankers, and managers alter the project?

4) The iPod illustrates a complex information dynamic: sometimes what we want isn't amenable to formulation in a search string. As a friend pointed out, Apple brilliantly turned the iPod Shuffle's lack of a display into a feature, selling randomness as a benefit. Serendipity matters for many kinds of information: there are times when the next song in a randomized playlist is "right" for reasons the listener could not have specified beforehand, or the webpage you found while looking for something else can have a major impact.

Another class of information relates to things for which either there is no name (industrial parts known mainly through numbers which do not appear on the part itself, for example) or the name is unavailable to the person who needs to find the thing that bears the name. Here, folksonomies hold both promise and peril: right now the process by which a term becomes standardized is, in Google VP of Engineering Adam Bosworth's term, "sloppy." That's good, in that committees don't have to form for work to get done, but bad in that the sloppiness introduces the prospect of s-p-a-m and other externalities of an open, networked process.

Right now, the numbers and more important the culture of the wikipedia community are manageable, with rare exceptions like the trauma of the George W. Bush entry, which was constantly redacted by editors with opposing viewpoints. If I need to find the name of something before I can search for it, and the name is unnecessarily arbitrary and/or fluid, it's going to cause problems. There's also the question of scalability: is there a threshold of participation past which there are just too many chefs in the kitchen? At the same time, if search looks more for keywords and semantic context as opposed to precise textual or numerical matches a la SQL, the noise in a community-driven system will aid processor- and algorithm-intensive search engines in steering people to what they need - which may or may not be what they articulated in the search bar. The contrasting strengths and weaknesses of folksonomies and XML namespaces are probably educational.

All in all, it's hard to project where the co-evolution of wikis, tags, XML, search, and databases will lead. Google has shown that relational databases don't scale infinitely, but indexing and search just might. In the other corner of the heavyweight boxing ring, Yahoo's purchase of Flickr gives it access to new technologies and not accidentally a way of looking at the world that will certainly bear fruits in the future. On the client side, new technologies in cell phones have the potential to add location to the context equation, with huge implications for both privacy and relevance. Having end-user appliances that are simultaneously a sensor (whether fixed, like the A9 search history, or mobile) and an input/output device changes the game still further.

To a greater degree than in the past few years, the technology market's "buzzing, blooming confusion" (to crib from William James, himself no slouch in the human-use-of-information department) leaves room for some new entrants to make a potentially enormous impact. It's hard to imagine IBM, Oracle, or Microsoft creating the ''next big thing" from this emerging toolbox, precisely because they're accustomed to avoiding the very sloppiness that drove its invention in the first place.

(1) includes a mnemonic for the Boston weather spire with some relevant addenda:

Steady blue, clear view
Flashing blue, clouds due
Steady red, rain ahead
Flashing red, snow instead.

Remember, however, that in summer, a flashing red light means that the upcoming Red Sox game has been canceled. And if the lights flash blue and red simultaneously, as happened for the first time October, 2004 it means the Red Sox won the World Series.

Friday, March 18, 2005

March 2005 Early Indications I: Being Analog

The march of digital processes and devices to fill spaces formerly occupied by analog technologies proceeds apace. Some examples follow:

-paper memos to e-mail

-"regular" cable to digital cable

-VHS to DVD and TiVo

-film to digital photography

-VGA and component video to DVI and HDMI

-LPs and cassette tapes to CDs and MP3s

-circuit-switched voice to Voice over Internet Protocols

-AM and FM radio to terrestrial digital and satellite services.

Many observers make the mistake of classifying a digital technology as "better" if only by the virtue of modernity. It's more useful, however, to treat any technology comparison as a contrast between different sets of costs and benefits. Furthermore, every development has unintended consequences that its creators could not have predicted, and these need to be considered as well.

A key factor in any digital technology is the ability to move artifacts over a wire. Compared to physical postage or even fax, various services on the Internet can move music, text, and images quickly and at high levels of fidelity. This capability in turn can be regarded as desirable or not. Is digital photography "better" than film? Artistic control over the final image, portability, and cost and speed of print turnaround are pluses, while film may have an edge in equipment cost, image quality, and privacy. (As for the last aspect, run a Google image search on DCP000[fill in a number - it's a default Kodak numbering scheme] then consider how many people want complete strangers viewing their snapshots? What happens when a) the hosting service goes out of business or b) the hosting service leaves up your images after you quit?)

Another core aspect of digital artifacts is their ability to be manipulated. In the case of Voice over IP and e-mail, encryption gives the bad guys an advantage over the law enforcement types who want to be able to monitor them. Digital cable TV, meanwhile, is most noteworthy not for image quality but for compression, which increases the providers' usable bandwidth substantially. Subscribers can get more channels over the same wire, but image quality (until HDTV) was limited primarily by the 50-year-old NTSC standard. Analog signal processing is an entirely different kettle of fish, with fewer possibilities.

As numerous executives have discovered to their dismay, e-mail is not secure, controllable, or ephemeral. Harry Stonecipher's departure from Boeing is difficult to imagine in a paper memo scenario: few people would use a workplace communications medium for romantic correspondence, and tipsters would not have automated (or other) access to it even if they did. In this instance, analog has clear benefits and can be the medium of choice when privacy matters.

For a variety of reasons, analog and digital options often aren't equally available. Music companies did what they could - closed LP pressing plants, for example - in the late 1980s and early '90s to make customers repurchase music they liked. The CD format's limited copy protection, however, made duplication and distribution extremely easy. Now, as the studios want to spur a new age of multi-channel audio, both Super Audio CD and DVD-Audio formats have highly effective copy protection. The tension between improved sound quality and inconvenience - and a competing standard - plays a role in the pathetic adoption rates.

We can see a similar transition in photography. Kodak will no longer process its legendary Kodachrome slide film, for example. Great film cameras are available in the secondary market - you can get a Hasselblad with lens for about $1000 - but the question is how long processing will be cost-effective and convenient. Processors are in a tough spot: as volumes decline, their assumptions about economies of scale have to be refigured, and the true cost of toxic waste disposal gets more explicit every year.

VCR sales are dropping worldwide, and several major electronics retailers have stopped carrying them. The content providers talk about "plugging the analog hole" - links in the chain of components where unencrypted signal can be digitized buy "unauthorized" parties. Thus the market dynamic is gladly accelerated by content providers only too happy to try to close the barn door before all the horses escape. The lack of backward compatibility means that dual-drive VCR+DVD machines are still offered, but the gap in image quality between analog and digital video, not to mention the greater permanence of polycarbonate over mylar and ferrous oxide, means that analog VHS has limited appeal.

In consumer markets, the easy mobility of digital artifacts has led to copy protection and encryption being primary engineering criteria for the manufacturers and copyright holders. Customers for such equipment have little choice but to pay for expensive functionality that does nothing to improve - and could possibly impair - the experience of using the equipment. It's obvious that Sony has impaled the fate of the company on the horns of this dilemma, but if anyone can resolve it, Howard Stringer is the guy.

Behind the scenes, the analog-digital transition is in some ways profound. Without consumer-grade copy protection to consider, digital tools are remaking medicine, music recording, and architecture, to name but three fields.

-To take only one area of medicine, digital mammography uses hardware (in which lower radiation doses are needed), software (image manipulation to increase contrast or zoom in), and data storage (including data mining) to improve on the performance of film. It's also easier to movie digital files, albeit large ones, than physical films, which is part of the process of outsourcing radiological readings to India and elsewhere. At the same time, merely capturing pictures of physicians' notes or orders means little without metadata to facilitate indexing, searching, and retrieval. Some digital systems are actually harder to use and less reliable than paper files in this phase of their evolution.

-Recording studios (including the Hit Factory in New York) are closing, in part because hard-drive-based editing systems allow musicians to make their own demo and even master tapes using software like Digidesign's ProTools. Even though it is favored by many respected engineers and performers, analog magnetic tape is getting scarce: Quantegy, the last manufacturer of pro-grade audio tape, shut down operations late last year. As studios convert to digital, pro-grade gear from such manufacturers as Studer and Otari is readily available - but with what future?

-The work of architects and designers has been reinvented. Just as word processors allowed writers numerous opportunities to edit and move text without the tedium of retyping, CAD tools make erasing and redrawing tasks of the past. 3D renderings of finished spaces and structures have become incredibly realistic. Construction documents and specifications have also become more automated.

The undeniable benefits of every digital technology raise important follow-up questions. Do we have better buildings because of AutoCAD? Are breast cancer detection and cure rates changing as a result of different diagnostic technology? Does getting copied on thousands of communications that would be impossible to distribute in paper make anyone more efficient or effective? Finally, it would appear that there are digital technologies with nearly unalloyed benefits (such as mammography), while many (e-mail, voice over IP, digital audio) are more complicated in their impact.

As the tools change, it's a fact of anthropological life that the tool-users will change along with them, but this is something we're less good at studying. Nicholas Negroponte deserves plenty of credit for the thinking that culminated in Being Digital (which is ten years old!), but now we have the far more difficult task of untangling what it meant to live in analog when it was the only option, and what it now means to be hybrids between the worlds of bits and atoms.