Tuesday, November 02, 2010

Early Indications October 2010: The Analytics Moment: Getting numbers to tell stories

Thanks in part to vigorous efforts by vendors (led by IBM) to bring
the idea to a wider public, analytics is coming closer to the
mainstream. Whether in ESPN ads for fantasy football, or
election-night slicing and dicing of vote and poll data, or the
ever-broadening influence of quantitative models for stock trading and
portfolio development, numbers-driven decisions are no longer the
exclusive province of people with hard-core quantitative skills.

Not surprisingly, the definition is completely problematic. At the
simple end of the spectrum, one Australian firm asserts that
"Analytics is basically using existing business data or statistics to
make informed decisions." At the other end of a broad continuum,
TechTarget distinguishes, not completely convincingly, between data
mining and data analytics:

"Data analytics (DA) is the science of examining raw data with the
purpose of drawing conclusions about that information. Data analytics
is used in many industries to allow companies and organization to make
better business decisions and in the sciences to verify or disprove
existing models or theories. Data analytics is distinguished from data
mining by the scope, purpose and focus of the analysis. Data miners
sort through huge data sets using sophisticated software to identify
undiscovered patterns and establish hidden relationships."

To avoid a terminological quagmire, let us merely assert that
analytics uses statistical and other methods of processing to tease
out business insights and decision cues from masses of data.
In order to see the reach of these concepts and methods, consider a
few examples drawn at random:

-The "flash crash" of May 2010 focused attention on the many forms and
roles of algorithmic trading of equities. While firm numbers on the
practice are difficult to find, it is telling that the regulated New
York Stock Exchange has fallen from executing 80% of trades in its
listed stocks to only 26% in 2010, according to Bloomberg. The
majority occur in other trading venues, many of them essentially
"lights-out" data centers; high-frequency trading firms, employing a
tiny percentage of the people associated with the stock markets,
generate 60% of daily U.S. trading volume of roughly 10 billion
shares.

-In part because of the broad influence of Michael Lewis's bestselling
book Moneyball, quantitative analysis has moved from its formerly
geeky niche at the periphery to become a central facet of many sports.
MIT holds an annual conference on sports analytics that draws both
sell-out crowds and A-list speakers. Statistics-driven fantasy sports
continue to rise in popularity all over the world as soccer, cricket,
and rugby join the more familiar U.S. staples of football and
baseball.

-Social network analysis, a lightly practiced subspecialty of
sociology only two decades ago, has surged in popularity within the
intelligence, marketing, and technology industries. Physics, biology,
economics, and other disciplines all are contributing to the rapid
growth of knowledge in this domain. Facebook, Al Qaeda, and countless
startups all require new ways of understanding cell phone, GPS, and
friend/kin-related traffic.

Why now?

Perhaps as interesting as the range of its application are the many
converging reasons for the rise of interest in analytics. Here are
ten, from perhaps a multitude of others.

1) Total quality management and six-sigma programs trained a
generation of production managers to value rigorous application of
data. That six-sigma has been misapplied and misinterpreted there can
be little doubt, but the successes derived from a data-driven approach
to decisions are, I believe, informing today's wider interest in
statistically sophisticated forms of analysis within the enterprise.

2) Quantitative finance applied ideas from operations research,
physics, biology, supply chain management, and elsewhere to problems
of money and markets. In a bit of turnabout, many data-intensive
techniques, such as portfolio theory, are now migrating out of formal
finance into day-to-day management.

3) As Eric Schmidt said in August, we now create in two days as much
information as humanity did from the beginning of recorded history
until 2003. That's measuring in bits, obviously, and as such Google's
estimate is skewed by the rise of high-resolution video, but the
overall point is valid: people and organizations can create data far
faster than any human being or process can assemble, digest, or act on
it. Cell phones, seen as both sensor and communications platforms,
are a major contributor, as are enterprise systems and image
generation. More of the world is instrumented, in increasingly
standardized ways, than ever before: Facebook status updates, GPS,
ZigBee and other "Internet of things" efforts, and barcodes and RFID
on more and more items merely begin a list.

4) Even as we as a species generate more data points than ever before,
Moore's law and its corollaries (such as Kryder's law of hard disks)
are creating a computational fabric which enables that data to be
processed more cost-effectively than ever before. That processing, of
course, creates still more data, compounding the glut.

5) After the reengineering/ERP push, the Internet boom, and the
largely failed effort to make services-oriented architectures a
business development theme, vendors are putting major weight behind
analytics. It sells services, hardware, and software; it can be used
in every vertical segment; it applies to every size of business; and
it connects to other macro-level phenomena: smart grids, carbon
footprints, healthcare cost containment, e-government, marketing
efficiency, lean manufacturing, and so on. In short, many vendors
have good reasons to emphasize analytics in their go-to-market
efforts. Investments reinforce the commitment: SAP's purchase of
Business Objects was its biggest acquisition ever, while IBM, Oracle,
Microsoft, and Google have also spent billions buying capability in
this area.

6) Despite all the money spent on ERP, on data warehousing, and on
"real-time" systems, most managers still can not fully trust their
data. Multiple spreadsheets document the same phenomena through
different organizational lenses, data quality in enterprise systems
rarely inspires confidence, and timeliness of results can vary widely,
particularly in multinationals. I speak to executives across
industries who have the same lament: for all of our systems and
numbers, we often don't have a firm sense of what's going on in our
company and our markets.

7) Related to this lack of confidence in enterprise data, risk
awareness is on the rise in many sectors. Whether in product
provenance (Mattel), recall management (Toyota, Safeway, or CVS),
exposure to natural disasters (Allstate, Chubb), credit and default
risk (anyone), malpractice (any hospital), counterparty risk (Goldman
Sachs), disaster management, or fraud (Enron, Satyam, Societe
General), events of the past decade have sensitized executives and
managers to the need for rigorous, data-driven monitoring of complex
situations.

8) Data from across domains can be correlated through such ready
identifiers as GPS location, credit reporting, cell phone number, or
even Facebook identity. The "like" button, by itself, serves as a
massive spur to inter-organizational data analysis of consumer
behavior at a scale never before available to sampling-driven
marketing analytics. What happens when a "sample" population includes
100 million individuals?

9) Visualization is improving. While the spreadsheet is ubiquitous in
every organization and will remain so, the quality of information
visualization has improved over the past decade. This may result
primarily from the law of large numbers (1% of a boatload is bigger
than 1% of a handful), or it may reflect the growing influence of a
generation of skilled information designers, or it may be that such
tools as Mathematica and Adobe Flex are empowering better number
pictures, but in any event, the increasing quality of both the tools
and the outputs of information visualization reinforce the larger
trend toward sophisticated quantitative analysis.

10) Software as a service puts analytics into the hands of people who
lack the data sets, the computational processing power, and the rich
technical training formerly required for hard-core number-crunching.
Some examples follow.

Successes, many available as SaaS

-Financial charting and modeling continue to migrate down-market:
retail investors can now use Monte Carlo simulations and other tools
well beyond the reach of individuals at the dawn of online investing
in 1995 or thereabouts.

-Airline ticket prices at Microsoft's Bing search engine are rated
against a historical database, so purchasers of a particular route and
date are told whether to buy now or wait.

-Wolfram Alpha is taking a search-engine approach to calculated
results: a stock's price/earnings ratio is readily presented on a
historical chart, for example. Scientific calculations are currently
handled more readily than natural-language queries, but the tool's
potential is unbelievable.

-Google Analytics brings marketing tools formerly unavailable anywhere
to the owner of the smallest business: anyone can slice and dice ad-
and revenue-related data from dozens of angles, as long as it relates
to the search engine in some way.

-Fraud detection through automated, quantitative tools holds great
appeal because of both labor savings and rapid payback. Health and
auto insurers, telecom carriers, and financial institutions are
investing heavily in these technologies.

Practical considerations: Why analytics is still hard

For all the tools, all the data, and all the computing power, getting
numbers to tell stories is still difficult. There are a variety of
reasons for the current state of affairs.

First, organizational realities mean that different entities collect
the data for their own purposes, label and format it in often
non-standard ways, and hold it locally, usually in Excel but also in
e-mails, or pdfs, or production systems. Data synchronization efforts
can be among the most difficult of a CIO's tasks, with uncertain
payback. Managers in separate but related silos may ask the same
question using different terminology, or see a cross-functional issue
through only one lens.

Secondly, skills are not yet adequately distributed. Database
analysts can type SQL queries but usually don't have the managerial
instincts or experience to probe the root cause of a business
phenomenon. Statistical numeracy, often at a high level, remains a
requirement for many analytics efforts; knowing the right tool for a
given data type, or business event, or time scale, takes experience,
even assuming a clean data set. For example, correlation does not
imply causation, as every first-year statistics student knows, yet
temptations to let it do so abound, especially as scenarios outrun
human understanding of ground truths.

Third, odd as it sounds in an age of assumed infoglut, getting the
right data can be a challenge. Especially in extended enterprises but
also in extra-functional processes, measures are rarely sufficiently
consistent, sufficiently rich, or sufficiently current to support
robust analytics. Importing data to explain outside factors adds
layers of cost, complexity, and uncertainty: weather, credit, customer
behavior, and other exogenous factors can be critically important to
either long-term success or day-to-day operations, yet representing
these phenomena in a data-driven model can pose substantial
challenges. Finally, many forms of data do not readily plug into the
available processing tools: unstructured data is growing at a rapid
rate, adding to the complexity of analysis.

In short, getting numbers to tell stories requires the ability to ask
the right question of the data, assuming the data is clean and
trustworthy in the first place. This unique skill requires a blend of
process knowledge, statistical numeracy, time, narrative facility, and
both rigor and creativity in proper proportion. Not surprisingly,
such managers are not technicians, and are difficult to find in many
workplaces. For the promise of analytics to match what it actually
delivers, the biggest breakthroughs will likely come in education and
training rather than algorithms or database technology.