Tuesday, June 07, 2005

May 2005 Early Indications II: Power laws for fun and profit

(shipped May 26, posted at www.guidewiregroup.com, and archived here)

Five years ago, the Internet sector was in the middle of a momentous
slide in market capitalization. Priceline went from nearly $500 a
share to single digits in three quarters. CDnow fell from $23 to
$3.40 in about 9 months ending in March 2000. Corvis, Music Maker,
Dr. Koop - 2000 was a meltdown the likes of which few recreational
investors had ever seen or imagined. Science was invoked to explain
this new world of Internet business.

Bernardo Huberman, then at Xerox PARC, and others found that the
proportion of websites that got the bulk of the traffic fell far from
the 80/20 rule of thumb: as of December 1, 1997, the top 1% of the
website population accounted for over 55% of all traffic. This kind
of distribution was not new, as it turned out. A Harvard linguist
with the splendid name of George Zipf counted words, and found that a
tiny percentage of English words account for a disproportionate share
of usage. A Zipf distribution, plotted on a log-log scale, is a
straight line from upper left to lower right. In linear scale, it
plunges from the top left and then goes flat for the characteristic
long tail of the distribution: twosies and then onesies occupy most of
the x-axis.

Given such "scientific" logic, investors began to argue that the
Internet was a new kind of market, with high barriers to entry that
made incumbents' positions extremely secure. Michael Mauboussin, then
at CS First Boston and now at Legg Mason, wrote a paper in late 1999
called "Absolute Power." In it he asserted that "power laws . . .
strongly support the view that on-line markets are winner-take-all."
Since that time, Google has challenged Yahoo, weblogs have markedly
deteriorated online news sites' traffic, and the distinction between
"on-line markets" and plain old markets is getting harder to maintain.
Is the Zipf distribution somehow changing? Were power laws wrongly
applied or somehow misunderstood?

Chris Anderson, editor of Wired, has a different reading of the graph
and focuses instead on the long tail. In an article last fall that's
being turned into a book, Anderson explains how a variety of web
businesses have prospered by successfully addressing the very large
number of niches in any given market. Jeff Bezos, for instance,
estimates that 30% of the books Amazon sells aren't in physical
retailers. Unlike Excite, which couldn't make money on the mostly
unique queries that came into the site, Google uses adwords to sell
almost anything to the very few people who search for something
related to it. As of March, every iTunes song in inventory (that's
over 1 million) had been purchased at least once. Netflix carries far
more inventory than a neighborhood retailer can, and can thus satisfy
any film nut's most esoteric request.

At the same time, producers of distinctive small-market goods (like
weblogs, garage demo CDs, and self-published books) can through a
variety of mechanisms reach a paying public. These mechanisms include
word of mouth, search-driven technologies, and public performance
tie-ins; digital distribution can also change a market's economics.
Thus the news is good for both makers and users, buyers and sellers;
in fact, libertarian commentator Virginia Postrel has written for the
last several years on the virtues of the choice and variety we
currently enjoy.

There's currently a "long tail" fixation in Silicon Valley. Venture
capitalists report seeing a requisite power law slide in nearly any
pitch deck. CEO Eric Schmidt showed a long tail slide at the Google
shareholder meeting. Joe Krause, formerly of Excite and now at
Jotspot, tries to argue for a long tail in software development upon
which his product of course capitalizes. The term has made USAToday
and The Economist. In some ways this feels like the bubble again, for
better and for worse.

At one level, the Internet industry seems to need intense bursts of
buzzword mania: you no longer hear anyone talking about push,
incubators, portals, exchanges, or on-line communities even though
each of these was a projected multi-billion dollar market. The visual
appeal of a Zipf distribution may also confer Anderson's long tail
with a quasi-scientism that simple terms like "blog," "handheld," or
"broadband" lack. Netflix, Amazon, and Google lacked power law
graphs, I'm pretty certain, in their startup documents and have
managed to thrive regardless. Anderson's own evidence illustrates
what a long way it is from explanation to prediction: showing how some
firms can profitably address niches doesn't prove that a startup will
similarly prosper in an adjacent market. To his credit, he focuses
primarily on entertainment, where digitization is most prevalent.

The recourse to supposed mathematical precision to buttress something
as unscientific as a business plan is not new. Sociologists
investigating networks of people have been overshadowed by physicists
who bring higher math horsepower to the same sets of problems, yet
it's still difficult to understand Friendster's revenue model.
Complex adaptive systems research was very hot in the 90s, following
in the course of the now barely visible "artificial intelligence."
The problem extends beyond calculus to spreadsheets: much of what
passes for quantitative market research is barely legitimate data. To
be reduced to a single semi-reliable number, a simple 5-point
questionnaire response should have the answers vary in regular
intervals, yet words rarely behave this way. Is "most of the time" 8
times out of ten or 95 times out of 100? Who remembers to count
before someone asks? Purchase intent rarely translates to purchase.
Yet executives make decisions every day based on customer satisfaction
scores, opinion surveys, and focus groups, all of which reduce noisy
variation to apparently clinical precision.

Make no mistake: Chris Anderson has identified something important and
widespread when he groups superficially dissimilar businesses to spot
their shared reliance on the medium's powerful capability for matching
big, sparse populations to things they want and will pay for.
Returning to our opening question with regard to what's changed since
2000, the necessary preconditions of successful long tail models
include large populations and strong search, a relatively new
capability. What will disrupt today's incumbents by 2010? New kinds
of batteries? Flexible displays? Enforced shutdown of the
peer-to-peer networks, possibly by a massive worm/virus of unknown

It's also important to see the both/and: just because quirky tastes
can constitute a profitable audience in new ways does not preclude
hits like the Da Vinci Code, let's say, from being major news. And
power laws still apply to traffic (and presumably revenue): Google and
Amazon profitably handle massive volumes of site visits whereas Real's
download service, about which Anderson rhapsodizes, still loses money.
At the end of the day, no algorithm in the world can negate the most
powerful "law" of business, that of cash flow.