Wednesday, August 20, 2008

The Paradox of Data Visualization

It has been a full quarter-century since the publication of Edward Tufte’s landmark book, The Visual Display of Quantitative Information. In that time, computer screens and other projection tools have emerged as a powerful medium challenging the primacy of paper, previously the default tool of choice. Information visualization is now exploiting new display technologies (think flexible OLED), new computing platforms (iPhones and their kin), and ever-increasing computing power (Playstation 3 et al). For visualization to capitalize on the power of these and other technologies, information architecture must increase in sophistication, usability, and explanatory leverage.

This task is deceptively difficult: on screen, even more than on paper, it is far too simple to use both desktop tools, such as the ubiquitous Excel, and enterprise “business intelligence” packages to create bad information displays. Going forward, the task will get harder even as it becomes more necessary.

Why?

Several factors are responsible. First, data is generated by more sources and available to more users every year. “Data glut” may be a cliche, but tools such as search can intensify it: Google recently announced that a trillion web pages had been indexed. For context, if each of the 32 million books in the Library of Congress averages 300 pages, that’s less than 10 billion physical pages, and these individual units reside in nothing resembling a unified, organized, searchable repository.

More important, the task of data visualization is difficult because good displays must create spatial representations of non-spatial data. This is not new: linear representations have conveyed time for millennia, and pie charts have become handy shorthand for subsets of a whole. Good maps remain the gold standard, but enjoy the advantage of being a spatial representation of, well, space rather than something less tangible. Consult a UK Ordnance Survey map, or a fine 19th-century sample from any number of countries, and compare the quality to the non-spatial representations we encounter every day: USA Today visuals, executive dashboards, or owner’s manuals. In most cases, the antique remains superior to the modern.

Going forward, information architects are challenged to create readable, repeatable conventions for such abstractions as risk, intellectual property (patents are a poor proxy for human capital, for example), and attitudinal information such as customer satisfaction. Semi-arbitrary lists of text-string matches remain hard to make visual: concepts are notoriously difficult to map spatially, in contrast to the elegance of the periodic table of the elements, to take a classic example. Current social network maps, especially those of large social graphs such as Facebook, quickly grow useless, as this example illustrates.

Color presents a further difficulty. Even outside a visual context, meanings are inconsistent: operating a business in the black is good, but a black mark on your record is bad. A person green with envy, or looking slightly green on a cruise ship, lies conceptually opposite from a green-lighted script in Hollywood; an environmentally conscious activity is another matter entirely. In displays, the situation is worse yet as colors seldom hold stable meanings. Color-blindness is a further fact of life overlooked by many applications.

Given all of these challenges, it’s important to get things right. Human powers of pattern recognition remain more formidable than anything powered by Moore’s law: ask a 5-year-old to find objects in her toy box that belong on a dinner table and she’ll likely outperform even laboratory-grade artificial intelligence. In the face of the information volumes noted above, rising at an accelerating rate because of video, and given the need for more decisions in shorter times than ever, humans need the augmentation that good displays can provide.

How?

In his book Envisioning Information, Tufte suggests five tactics for increasing information density and “escaping flatland” – conveying more than two dimensions of meaning on paper. These are:

-Micro/macro readings (relating both wholes and parts as distinct entities)
-Layering and separation (often by use of color and graphic weight, as in a technical drawing)
-Small multiples (to show often subtle differences within elements of a system: a good lunar chart is an example)
-Color and information (sensitivity to the palette as color labels, measures, represents reality, and enlivens)
-Narratives of space and time (compressing the most powerful human dimensions onto flatland).

For all of the wisdom in these suggestions, and the beauty of Tufte’s examples – it’s no accident he’s both a statistician and a working artist – good information visualizations remain rare. For information to convey meaning in standard, predictable ways, we need tools: “tools” as in grammars and lexicons rather than more software widgets. (A noteworthy effort that appears to be directly in line with this need is Leland Wilkinson’s The Grammar of Graphics (New York: Springer, 2005).)

Some precedents may be useful. The history of sailing and shipping is rich with examples of various parties agreeing on conventions (port and starboard do not vary in different countries the way rules for automobiles do) and solving problems of conveying information. Container ships interlock regardless of carrier while being handled at countless global ports. The Beaufort wind scale arose from the need for agreed-upon metrics for measuring wind aboard a ship, a matter of great practical importance. Even today, with satellites and computerized navigation systems, a Beaufort 0 (“Calm; smoke rises vertically”) is the same around the world, while a 12 (“Air filled with foam; sea completely white with driving spray; visibility greatly reduced”) spells disaster no matter how fast the hurricane winds are actually blowing.

Closely related to navigation is weather. From early tabular and graphic representations of temperature and precipitation until 1982, when USA Today made a full-page set of color maps and tables a trademark of the upstart publication, tools for understanding weather and climate have helped lead the state of information visualization. More recently, graphics workstations were overrepresented in television studios as an arms race of weather-casting helped advance the state of the field. The weatherpeople can boast results: compare the number of people who can understand a Doppler radar image to those who can grasp binomial distributions, bid-ask spreads, or treemaps.

Musical notation is another relevant example. Easily transportable, relatively impervious to language, and yet a representation (rather than a reproduction) of a performance, scores have the kinds of conventions that information visualization for the most part still lacks. At this point, good visualizations are featured in “galleries” – as befit works of art. They are created by artists and artisans, not by people who merely have something to say. At the risk of a strained analogy, we are at the stage where latter-day monks painstakingly hand-letter sacred texts, still awaiting both Gutenberg and the typewriter.

As the references suggest, this is a rich field with many fascinating byways and footnotes. For a superb historical overview, see Friendly and Denis, Milestones in the History of Thematic Cartography, Statistical Graphics, and Data Visualization.

Current Directions


There is cause for optimism, however, on several fronts:

1) Hollywood and the gaming market are leading a charge toward effective, practical, and multidimensional visual tools. A quick look at the most recent SIGGRAPH conference program illustrates the overlap as featured speakers come from both computer science powers like Carnegie Mellon and animation houses such as Pixar.

2) 3-D has become a day-to-day reality in architecture, manufacturing, and, again, gaming, and this trickles down to consumer applications such as kitchen design tools. Even state-of-the-art roller coasters are being rendered (how, I don’t know) in CAD for mass enjoyment.

3) Since the release of its Flex version 2 product in 2006 which extended Flash functionality to more programming environments, Adobe has supported the rapid development (in both senses of the term) of visually attractive, data-driven visualizations: search the Flex application showcase and see a wide variety of database-driven shopping, monitoring, configuration, and wayfinding tools. Some are extremely handsome and useful, and even some of the visually “flat” examples possess a high information density.

4) Mapping remains important, and through the release of APIs from the likes of the UK Ordnance Survey, ESRI, Microsoft, and Google, developers can build data-rich geographically-useful tools more easily than ever before. Once again, Flex can accelerate the process.

5) In the labs, more senses are being enlisted in human information processing. Just as force-feedback enhances a driving game, so too can users navigate 3-D data volumes with haptic tools. Sonification is another emerging tool: when two or more elements interact (for example, sensor or observational reports regarding some phenomenon), the user can hear sounds of varying harmony to indicate different kinds of similarity or coherence. If two reports have plausible time stamps but fail to corroborate each other, there may be a) bad information in the system or b) two observed targets rather than one at different times. A dissonant tone warns the user from overlooking the potential inconsistency. (For more see here.)

6) Transparency in real or implied 3D data volumes can allow exceptions to stand out more clearly. These approaches can score well on several of Tufte’s implied indices. In 2-D, transparency preserves the baseline information.

7) Time can be manipulated via sliders and other intuitive tools, effectively creating animations. True data richness, as in this example from the Boston Federal Reserve Bank, is well served by easy comparison across time and county, instantly obvious navigation, subtle but clear use of color for information, and appropriate scale: doing such a tool on a national level would be unfeasible and lose appropriate granularity.

There’s no shortage of activity, samples of which can be experienced at Flex.org, Visual Complexity, or this IBM site - some work is truly stunning, and global centers of design leadership are emerging. Even so, the fundamental tension quickly becomes evident: words like “galleries” suggest that we are viewing works of art, and in many instances the work should be in museums. But art by definition is unique; visualization needs to be brought to the masses of managers, citizens, and students who have something to say but lack the tools, grammar, and training to create the beautiful. In short, the task is to help high levels of information visualization migrate from the artist to the worker.