Saturday, August 01, 2020

Early indications July 2020: Our digital twins?

Seeing the major US tech CEOs testifying before Congress earlier this week is a useful prompt to consider just what it is those companies sell to become so powerful. Amazon sells household goods, information goods, groceries, computing capability, and now eyeballs: 2019’s $14 billion in ad revenue was a 40% jump on the prior year. (For perspective, that’s more than the company made on cloud services as recently as 2016.) Apple sells high-margin hardware, and increasingly services: at about $50 billion a year (annualized), the App Store, iTunes, cloud storage, and the like outperformed most of the company’s hardware lines, but not the iPhone. Google and Facebook, however, are less diversified: they each sell some version of us.

What we will consider this month is the degree to which the digital representations of us that are modeled and manipulated by the ad giants mimic a notion with its origins in heavy industry: the digital twin. Briefly, a GE, Boeing, or Caterpillar can aspire to accumulate and crunch sensor data from thousands or millions of Internet-connected Things such as jet engines, MRI machines, airframes, or excavators. Identification of safety risks, predictive maintenance optimizations, and other business processes is the grail in this world: it’s far better for BNSF to have a digital locomotive fail in a simulation than the physical one 500 miles from a breakdown crane. As with self-driving cars, all data-powered products (think of Tesla’s over-the-air software upgrades) can theoretically be as capable as the most capable unit. As of now, the industrial digital twin is closer to whiteboard aspiration (see this new book about GE’s failures) than to profitable reality.

It’s pretty obvious after spending any time on a modern digital platform that its algorithms are easily fooled. Run a few searches for birthday presents for someone, and your ad feed quickly resembles the demographic of your giftee. Now that I have a physician in the household accessing Epic and other clinical systems, I get “overspray” in the form of ads for prescribers of IUDs by virtue of sharing a network, apparently. The reference librarian knows I’m writing a research paper on government policy regarding Puerto Rico whereas Google responds as though I want to vacation there.

At one level this slippage is reassuring: I used to get ads for industrial ropes and slings in my Gmail header, not to mention ads from malpractice lawyers representing patients with complications from the implantation of transvaginal mesh. At the same time, the uncanniness of ads has led to widespread suspicion that open microphones are capturing spoken conversation. There’s also the head-fake to consider: Target used to send coupons that too closely resembled a person’s interests or shopping list, and customers were creeped out. Target’s solution, if I recall correctly, was to add “noise” coupons to calm suspicious consumers: there’s nothing like a lawn mower ad to distract from how much the store knows about your health and beauty purchasing habits.

At the same time that we are (mis-)represented by behavioral data collected both on- and off-line, in unfathomable quantities, that can vary widely from our “real” selves, there are data representations of us of much higher fidelity. None of these are _currently_ aimed at trying to get me to do something, though as we will see, the lure of ad revenue extends farther and farther, to include ISPs for example. Verizon/AT&T have an extremely accurate map of my daily movement, provided my phone is within a few feet of my person most of the time. Smart TVs and cable boxes track viewing habits, with the data shared in sometimes-objectionable ways. (Devices from multiple manufacturers share behavioral data with Google, Facebook, and Netflix, for example, and opting out is predictably difficult.) Personal fitness trackers and exercise trainers are another source of high-fidelity data that could someday contribute to a “digital twin.”

Google and Facebook get paid when we click on ads. Ideally, Aetna should want me to live a long and healthy life with few expensive conditions/episodes and I presumably agree. How much will these two trajectories — digital twin as behavioral experiment vs digital twin as predictive maintenance — diverge vs converge?

This is pure speculation, but I think the player to watch is the richest CEO from the congressional hearing the other day. Amazon has 1) a vast store of behavioral, social network (in the form of our address books and gifting history), and purchase data, 2) unsurpassed computational power and algorithmic talent, 3) designs on medical markets, as evidenced by its PillPack acquisition, and 4) strong motivation to lower health care costs for its enormous — soon to approach one million — workforce. (I missed it, but Harvard surgeon and New Yorker author Atul Gawande left the CEO post at the Amazon/Berkshire/JPMorgan Haven Healthcare startup back in May.) Amazon warehouse workers already wear wristbands to track movement and allegedly productivity. If there’s a digital twin of an employee already built, the e-commerce behemoth is as likely as anyone else to have built it. 

Where might we go from here? Behavioral nudges — to lose 5 pounds, to get up from the desk and stretch, to eat more vegetables — would seem to be a perfect marriage of the two trajectories. Back in the days of e-commerce, Pets.com discovered a powerful predictive question: if the website visitor bought his or her animal a present on its last birthday, they were substantially more likely to make a purchase than a site visitor without that behavior. Where else can big data expose similar minimally invasive but predictively powerful indicators of long-term well-being? If I were inventing a “dream” college graduate right now, she’d have some combination of algorithmic aptitude, behavioral economics, and engineering training to understand big data, human motivation/reward, and systems thinking. 

Bearing in mind the fact that the man behind Facebook’s explosive growth between 2007 and 2011 won’t let his own children near the “short-term, dopamine-driven feedback loops that . . . are destroying how society works,” how might the future be better? The first thought is algorithmic transparency, a phrase that has yet to be operationally defined, as Microsoft’s danah boyd has shown: few of us could read the algorithm, and the algorithm is an abstraction without user data, raising privacy hurdles. Second, there has to be a working definition of ownership: at some point, a person’s data footprint should be under his or her influence rather than being remote and inaccessible. Realistically, this would mean FTC- or FDA-like regulation. Third, we need better sensors, better sensor protocols (including for privacy), and better sensor-data analytics: if AM General can’t predict when the Humvee transmission will fail in Afghanistan vs in Alabama, Mass General is still a long way from identifying when I will have a stroke. 

Last, I would be in favor of intensifying training in critical thinking. The echo chambers so powerfully created and manipulated by Facebook (among others, obviously) would gain less traction if more people sniffed out hoaxes and self-serving propaganda. Scientific literacy appears to be in retreat, in part, it appears, because of those “short-term, dopamine-driven feedback loops”: people, it turns out, are incredibly easy (and profitable) to game.

How can we as parents, as educators, as citizens, as humans demand — and model — better? It sounds paradoxical, but better critical thinking and digital literacy skills will help us build new kinds of organizations — of learning, of governance, of news/media — to replace today’s so visibly broken ones. Restoring institutional credibility and cognitive authority (in short: trust) in our institutions, nurturing humanistic leaders who grasp the realities of today’s vast machines of data collection and behavioral manipulation, will be a long road, but one I believe is worth hiking, one careful step at a time.