by Michio Kaku
* * *
Back in the lab at MIT, Mershin pulls a blue box off a shelf. It’s filled with a jumble of green, blue, and black wires. It looks like one of those boxes we all keep in a closet somewhere, filled with cords and cables that belong to gadgets we’ve lost or upgraded. But plugged into several of these wires, I see a white, plastic credit card–shaped object. This is their new Nano-Nose, revised and dramatically shrunk down from the metal-clad, DARPA-tested box. (The wires and cords are all peripherals, meant for pumping odors and electrical current into the nose.)
Over the past few years, Zhang has continued to tinker with the olfactory receptors he and Mershin use in their Nano-Nose. Most importantly, he’s stopped growing them in embryonic cells, having devised a way to cultivate them in a biologically inert form. It all happens in a test tube now. The receptors are still tricky to handle—Mershin says they are by far the most difficult aspect of the device—but these are more stable and malleable than their organic counterparts. Mershin and Zhang have also progressively shrunk the Nano-Nose’s circuit boards. That means the entire apparatus could now be attached to a port on a bioreactor to sniff what’s happening inside. It could go inside a factory and smell products for quality control or be put inside a grain silo to smell for food spoilage. But Mershin and Zhang say they have no interest in turning their research into a business at the moment.
So far, the only company daring enough to design a commercial technology that uses olfactory receptors—with a design very similar to the Nano-Nose—is a small Silicon Valley startup called Aromyx. In some ways, it is even more ambitious than Mershin and Zhang. The Nano-Nose uses only about twenty kinds of receptors and customizes each nose depending on its purpose. But Aromyx wants to pack all 400 human olfactory receptors onto its EssenceChip, a three- by five-inch plastic plate dotted with small wells to hold the receptors. When the EssenceChip is exposed to an odor, the receptors fire and the chip records that activation pattern. What’s the smell of Coca-Cola? Or Chanel No. 5? The answer, again, isn’t a list of molecules. “It’s a pattern of receptor responses,” says Aromyx founder Chris Hanson. Thus far, Aromyx has stabilized only a few of those 400 receptors. As they add receptors, the thinking goes, their digital olfactory rendering will become finer and more detailed.
“This is a window into human sensory experience,” Hanson says. If so, it’s a fragile one. Aromyx still grows its receptors in yeast cells, and the company has struggled to put together a basic product for a demo. When Aromyx recently changed offices and moved seven miles from Palo Alto to Mountain View, some of its cell lines were destroyed in the shuffle.
As for Mershin, he is embarrassed by how messy the Nano-Nose still looks, but his curls start bouncing when we talk about its potential applications. Right now, the Nano-Nose is just a detector. It can’t interpret the data it collects. But Mershin and Zhang want to make it smart—like a dog. And that’s where Mershin’s tormentors, the prostate cancer–sniffing dogs from the video, come in. It turns out Mershin is not just competing with the canines, he’s also collaborating with them.
In his office, Mershin gives me the place of honor: a black velour chair where Florin, another prostate cancer–detection dog, sat when she came to visit. Florin and Lucy belong to a group in the U.K. called Medical Detection Dogs, which has trained many of the animals that have been able to sniff out cancers.
Right now, Mershin and Zhang are training an AI system on a bunch of data, some of it collected by Medical Detection Dogs on how their animals responded to specific urine samples—whether they alerted to cancer, how long they lingered, and the like—and some of it collected by Mershin and Zhang when they ran the same urine samples through a gas chromatographer/mass spectrometer. Mershin says these streams of data will help them select which receptors they need to put into the Nano-Nose. But the main event will come when he runs those same urine samples through the Nano-Nose and begins collecting data on its responses. Then he’ll mine all three data sets for correlations. Mershin already has all the urine frozen in his lab, ready to go.
The idea is to ultimately run a kind of Turing test, but for smell—to imitate the dogs’ results until no one can differentiate between the Nano-Nose’s reactions and a canine’s. If all goes well, the Nano-Nose will become more than just a sensing device; it will be a true diagnostic tool. The richer the database, the better the nose will be.
Ultimately, Mershin wants to see the Nano-Nose incorporated into your cell phone. He imagines using this intimate version of his device—one that rests at all hours against its owner’s body—to collect longitudinal data about its wearer’s health. Eventually, the nose would be able to alert you to get that mole on your thigh checked out, or warn you that your blood sugar is dropping dangerously low, or perhaps that you’ve started emitting the woody, musky odor of Parkinson’s disease. The Nano-Nose could accompany you everywhere and keep tabs on you in ways that doctors never could. Everything that a dog can detect via smell, it would detect.
That’s a powerful idea, but it’s also an unsettling one. How much control over your odor profile data would you retain? And if your phone is capable of sniffing you, what other devices would do the same? In a world where digital olfactory sensors have become small enough to fit into your pocket, presumably they’ll end up elsewhere—much the way video cameras did before them. If your diseases and mental states leave suddenly legible reports in the air, no doubt people besides you and your doctor will be curious to read them. (Your insurance company, for instance.)
Poppy Crum, chief scientist at Dolby Labs, is rooting for technologies like the Nano-Nose, which she believes could democratize the early diagnosis of disease. But she also sees artificial olfaction as one of a host of rising technologies—some much farther along than others—that use sensors and data to suss out otherwise hidden inner states. Those technologies all require new standards for transparency and user control of data—standards that aren’t going to come from companies or researchers. “I think that’s something that has to be legislated,” Crum says.
Mershin, for his part, isn’t so worried about the dawn of an olfactory surveillance state. Instead, as a consummately overstimulated person, he dreads a world where devices start sending you odors. “I would be very supportive of all the technologies that smell you. I would be very leery of technologies that want you to smell them,” he says. “Don’t let the phones start putting scents in your head. Bad idea.” In other words, let your phone be the dog; you be the handler.
PATRICK HOUSE
I, Language Robot
from Los Angeles Review of Books
As a child, when I received a new toy or pet, I would immediately visualize the worst that might happen: the novelty matches burning down the house; the parakeet flying straight through the glass door; the Lego bricks melting into the carpet; the Laser Tag gun mistaken for a real one. It was a psychological tic that served, probably, to inoculate me against loss.
I recognized the stirrings of that same tic earlier this year, when I was hired to write short fiction at OpenAI, a San Francisco–based artificial-intelligence research lab. I would be working alongside an internal version of the so-called “language bot” that produces style-matched prose to any written prompt it’s fed. The loss that I feared was not that the robot would be good at writing—it is, it will be—nor that I would be comparatively less so, but rather that the metabolites of language, which give rise to the incomparable joys of fiction, story, and thought, could be reduced to something merely computable.
The worst outcome was clear in my mind: two copies of the exact same short story, each only a single printed page in length—one written by me and the other, starting from the same opening sentence and nothing more, by the robot.
The greatest potential loss in our relations to machines is not runaway GDP or disinformation, but rather the existential right to enjoy the surprise and uniqueness of human effort. I know this loss well. For
about fifteen years, I played on average at least one game of Go, an ancient Chinese board game, per day. My joy from the game depended on what I believed to be things the human brain was uniquely good at: aesthetics, patterns, sparse rewards, ambiguity, and intuition. However, since the release, in 2016, of an AI better at Go than the best human will ever be, I have played the game, listlessly, only once or twice.
Could a similar deflation happen to language itself? To prose? To story? Does it matter to our enjoyment or interpretation of language whether the words are generated in the same way from brains or bots? As a neuroscientist as well as a writer, I started to wonder how I worked. How different was the robot’s training from how a human learns language? How constrained am I by the probabilities and structure of the language I’ve learned through a lifetime of observation and error?
Am I, are we all, just language robots?
I was reminded of something Carl Sagan once said—“If you wish to make an apple pie from scratch, you must first invent the universe”—and wondered about its analogue. If you wish to write a short story from scratch, what must you first invent?
* * *
The logic behind the language robot is that word choice, like temperature, is entropic. That is, every word changes the likelihood of the eventual distribution of future and past words in much the same way that temperature both changes and is the distribution of future and past atoms in a room. The language robot fills in words as one might a sheet of Mad Libs by estimating the probabilities of a new word given a rolling tally of the words before and after it.
The version I used, through a cloud-connected app on my phone—the ideal form of a twenty-first-century Muse—learned to write prose by having to guess, one word at a time, a missing word from the text of more than 40 gigabytes of online writing, or about 8 million total documents. Despite being trained on the conversational language of the internet, it was able nimbly to imitate many literary styles. For instance, when I gave a professor of comparative literature at Stanford a robot-infused version of “The Short Happy Life of Francis Macomber,” he failed to correctly note where Ernest Hemingway ended and the robot began. Next I tried a British man in a bar—Cambridge-educated, with a degree in English—and he, too, failed a similar test with Douglas Adams and The Hitchhiker’s Guide to the Galaxy. A Shakespeare scholar, told to ignore verse, couldn’t point out what was King Lear and what was robot. (Yes, it is that good.)
Subjectively, a final written or spoken word can feel like it came from somewhere effortless and preconscious or like it simply happened, de novo. Of course, it didn’t. Something had to cause it. One possibility is that word choice is a process of selection from a palette of probable words—the final choice simply that which remains, like a sculpture emerging from a reduced block of marble. Another possibility is that each word is somehow generated thermally, like life, and the writer or speaker is pulled along, as if on a gradualist’s leash, as the words evolve and change together.
The Oxford professor and poet Hannah Sullivan, author of The Work of Revision, a book about writers, technology, and editing, once described her side of this debate to me, in the context of poetry: “However elegant a final sentence might be or a poem might be, it is not something that has been made out of a massive set of possibilities,” said Sullivan. It has been created from zero, from nothing, she continued: “I don’t think poets are in fact choosing one word out of a hundred possible words when they’re thinking about a rhyme. That’s not actually how language works. The rhyme is kind of there first, and then the other words sort of happen around it.”
When I asked the former poet laureate of the United States Robert Pinsky where that first word might come from, he mentioned the important link between poetry and movement. “Poetry is very physical,” said Pinsky. “It’s bodily. It has to do with breath. It has to do with the tongue and the lips and the pharynx and all the things you do to produce the sounds and little bones in your ear.” He connects poetry’s physicality to early-in-life acoustic and linguistic training in his book, The Sounds of Poetry, where he argues that the “hearing-knowledge” brought to poetry—say, the tidal prosody of Thomas Hardy’s lines on the doomed Titanic: “Dim moon-eyed fishes near / Gaze at the gilded gear / And query: ‘What does this vaingloriousness down here?’”—is trained on what Pinsky calls “peculiar codes from the cradle.” These codes are the sonic patterns in speech learned preconsciously in youth and which are “acquired like the ability to walk and run.”
Pinsky is more than figuratively right. All of language is a physical act. Vocalized speech is the end of a muscular sequence involving hundreds of coordinated muscles in the stomach, lungs, throat, lips, and tongue. Likewise, our internal voices, the base of most of what we call thinking, are simulated speech acts handled deftly by the parts of the brain that plan possible action. Our capacity for language likely repurposed other, preexisting functions of the mammalian brain involving movement—for instance, so-called “language” areas of the brain also show activity in brain-imaging studies during nonlinguistic tasks like grasping or while viewing hand-based shadow puppets, supporting this theory.
What, if anything, precedes the kernel of that first word somewhere between a writer typing it and the invention of the universe? (What, if anything, made it so likely that the final word of the previous sentence—and, as you may by now have intuited, also this one—was almost certainly going to be “universe”?)
The short answer, for robots and humans alike, is training.
Rich Ivry, a professor of psychology and neuroscience at Berkeley, recently coauthored two papers, in Nature Neuroscience and Neuron, that support the idea that a particular brain region called the cerebellum is responsive to both motor and some language tasks. Ivry told me that the region, which contains more than two-thirds of the total number of neurons in the human brain, probably started out by fine-tuning movement and sensation early in mammalian evolution but has since expanded its role—at least in humans—to help with more general tasks like cognition and language. As one might offload heavy arithmetic to a calculator during tax preparation, the fancier, conscious parts of the brain can, in general, offload to the cerebellum some of the immensely complex sensory and motor coordination involved in moving the human hand and body.
One theory, said Ivry, is that the region is predicting what’s going to happen in the very near future based on a lifetime of tracking what has worked and what hasn’t in the past and adjusting its movements accordingly. The cerebellum, said Ivry, is a “giant pattern recognition system” containing, perhaps, all the peculiar codes of the cradle. “In a sense you could say it is capable of basically storing all possible patterns we’ve ever experienced,” said Ivry.
Imagine reaching for a coffee mug in freefall, catching a fly out of the air with one’s bare hand, or returning a surprise drop serve in tennis. For dynamic motor tasks, where the goal is also moving as the arm does, the brain must be able to update its reach as it reaches. One need only apply this same motoric prediction to the statistics of a sentence to see how the brain might coordinate both.
One possibility is that the cerebellum is performing a crudely similar Mad Libs–style prediction to what the language robot does. “As I hear a sentence, the cerebellum is generating a predictive model of the likely words I’m about to hear,” said Ivry. Some of the base calculations, in other words, thought to be performed best by the cerebellum might underlie its contribution to both movement and language and the process through which each becomes honed with practice.
In what is either a coincidence or a remarkable sufficiency condition for language learning, the parallelized structure of the cerebellum—which Ivry described as “the exact same processing unit repeated billions and billions of times”—is analogous to that of the massively parallel hardware the language robot was trained on, called a graphics processing unit, or GPU. (The Canadian neuroscientist and computer scientist Jörn Diedrichsen
sees similarity in their evolution, as well: “The GPU is a kind of special purpose circuit designed to very quickly do a specific type of computation. And now the GPU is being reused in many ways that the original inventor didn’t anticipate.”)
Robert Pinsky, developing on an idea of Ezra Pound’s, once said that if prose writing is like shooting an arrow, poetry is like doing so from atop a horse. Likewise, there are differences between a word and le mot juste, Gustave Flaubert’s “the right word,” just as there are between a bull’s-eye and a mounted archer’s bull’s-eye. (“I dare say there are very good marksmen who just can’t shoot from a horse,” wrote Pound.) How would the cerebellum elicit such expert differences? Ivry said that it might not. The cerebellum could be just a cliché generator for movement or language, as likely to give a merely probable linguistic continuation as it is a stereotyped racquet swing or lazy grab at a stumbling mug.
High aesthetics is the cliché’s opposite—the low-probability event, what Sullivan called the “high-wire act” of surprise within constraint—which operates outside statistics and cradle codes to create an entirely new set of expectations. It remains to be seen which training architecture—cerebellum or GPU—is capable, long term, of the highest of the high-wire acts. For now, the language bot can only imitate the literary greats, which is small comfort. There was, after all, a time in even Shakespeare’s youth when all he could do was babble the imitative sounds of his mother.
* * *
The language robot and I never wrote the same story. It became, eventually, a writing tool, both as a curator to new ideas and a splint for bad ones. When, after a few months, I lost access to it, my brain felt like it had been lesioned. I missed it. I had found myself jumping into its head asking what it might write as one would a human writing partner, which made me wonder: Should I give it the same theory-of-mind, inductive respect as I give to other thinking things? Could I even help but do otherwise?