Book Read Free

The Inevitable

Page 23

by Kevin Kelly


  Lanier popularized the term “virtual reality,” but he was not the only person working on immersive simulations at that time in the late 1980s. Several universities, a few startups, as well as the U.S. military had comparable prototypes, some with slightly different approaches for creating the phenomenon. I felt I had seen the future during my plunge into his microcosmos and wanted as many of my friends and fellow pundits as possible to experience what I had. With the help of the magazine I was then editing (Whole Earth Review), we organized the first public demo of every VR rig that existed in the fall of 1990. For 24 hours, from Saturday noon to Sunday noon, anyone who bought a ticket could stand in line to try out as many of the two dozen or so VR prototypes as they could. In the wee hours of the night I saw the psychedelic champion Tim Leary compare VR to LSD. The overwhelming impression spun by the buggy gear was total plausibility. These simulations were real. The views were coarse, the vision often stuttered, but the intended effect was inarguable: You went somewhere else. The next morning William Gibson, an up-and-coming science fiction writer who stayed up the night testing cyberspace for the first time, was asked what he thought about these new portals to synthetic worlds. He then first uttered his now famous remark: “The future is already here; it’s just not evenly distributed.”

  VR was so uneven, however, it faded. The next steps never happened. All of us, myself included, thought VR technology would be ubiquitous in five years or so—at least by the year 2000. But no advances happened till 2015, 25 years after Jaron Lanier’s pioneering work. The particular problem with VR was that close enough was not close enough. For extended stays in VR longer than 10 minutes, the coarseness and stuttering motion caused nausea. The cost of gear sufficiently powerful, fast, and comfortable enough to overcome nausea was many tens of thousands of dollars. Therefore VR remained out of reach to consumers, and also out of reach for many startup developers who needed to jump-start the creation of VR content to spark the purchase of the gear.

  Twenty-five years later a most unlikely savior appeared: phones! The runaway global success of the smartphone drove the quality of their tiny hi-res screens way up and their cost way down. The eye screens for a VR goggle are approximately the size and resolution of a smartphone screen, so today VR headsets are basically built out of cheap phone screen technology. At the same time, motion sensors in phones followed the same path of increasing performance and decreasing cost, until these motion sensors could be borrowed by VR displays to track head, hand, and body positions for very little. In fact, the first consumer VR models from Samsung and Google use a regular smartphone slipped into an empty head-mounted display unit. Put a Samsung Gear VR on and you look into a phone; your movements are tracked by the phone, so the phone sends you into an alternative world.

  It’s not difficult to see how VR will soon triumph in movies of the future, particularly visceral genres like horror, erotic, or thrillers—where your gut is also caught up in the story. It’s also easy to imagine VR occupying a prime role in video games. No doubt hundreds of millions of avid players will eagerly don a suit, gloves, and helmet and then teleport to a far-away place to hide, shoot, kill, and explore, either solo or in tight bands of friends. Of course, the major funder of consumer VR development today is the game industry. But VR is much bigger than this.

  * * *

  • • •

  Two benefits propel VR’s current rapid progress: presence and interaction. “Presence” is what sells VR. All the historical trends in cinema technology bend toward increased realism, starting from sound, to color, to 3-D, to faster, smoother frame rates. Those trends are now being accelerated inside VR. Week by week the resolution increases, the frame rate jumps, the contrast deepens, the color space widens, and the high-fidelity sound sharpens, all of it improving faster than it does on big screens. That is, VR is getting more “realistic” faster than movies are. Within a decade, when you look into a state-of-the-art virtual reality display, your eye will be fooled into thinking you are looking through a real window into a real world. It’ll be bright—no flicker, no visible pixels. You will feel this is absolutely for sure real. Except it isn’t.

  The second generation of VR technology relies on a new, innovative “light field” projection. (The first commercial light field units are the HoloLens made by Microsoft and Magic Leap funded by Google.) In this design the VR is projected onto a semi-transparent visor much like a holograph. This permits the projected “reality” to overlay the reality you see normally without goggles. You could be standing in your kitchen and see the robot R2-D2 right before you in perfect resolution. You could walk around it, get closer, even move it to inspect it, and it would retain its authenticity. This overlay is called augmented reality (AR). Because the artificial part is added to your ordinary view of the world, your eyes are focused deeper than they are on a screen near your eyes, so this technological illusion is packed with presence. You almost swear it is really there.

  Microsoft’s vision for light field AR is to build the office of the future. Instead of workers sitting in a cubicle in front of a wall of monitor screens, they sit in an open office wearing HoloLenses and see a huge wall of virtual screens around them. Or they click to be teleported to a 3-D conference room with a dozen coworkers who live in different cities. Or they click to a training room where an instructor will walk them though a first-aid class, guiding their avatars through the proper procedures. “See this? Now you do it.” In most ways, the AR class will be superior to a real-world class.

  The reason why cinematic realism is advancing faster in VR than in cinema itself is due to a neat trick performed by head-mounted displays. To fill a gigantic IMAX cinema screen with the proper resolution and brightness to convince you it is a mere window into reality requires a massive amount of computation and luminosity. To fill a 60-inch flat screen with the same window-clear realism is a smaller challenge, but still daunting. It is much easier to get a tiny visor in front of your face up to that quality. Because a head-mounted display follows your gaze no matter where you look—it is always in front of your eyes—you see full realism all the time. Therefore if you make fully 3-D clear-as-a-window vision and keep it in view no matter where you look, you can create a virtual IMAX inside of the VR. Turn your gaze anywhere on the screen and the realism follows your gaze because the tech is physically attached to your face. In fact, the entire 360-degree virtual world appears in the same ultimate resolution as what’s in front of your eyes. And since what is in front of your eyes is just a small surface area, it is much easier and cheaper to magnify small improvements in quality. This tiny little area can invoke a huge disruptive presence.

  But while “presence” will sell it, VR’s enduring benefits spring from its interactivity. It is unclear how comfortable, or uncomfortable, we’ll be with the encumbrances of VR gear. Even the streamlined Google Glass (which I also tried), a very mild AR display not much bigger than sunglasses, seemed too much trouble for most people in its first version. Presence will draw users in, but it is the interactivity quotient of VR that will keep it going. Interacting in all degrees will spread out to the rest of the technological world.

  * * *

  • • •

  About 10 years ago, Second Life was a fashionable destination on the internet. Members of Second Life created full-body avatars in a simulated world that mirrored “first life.” A lot of their time was spent remaking their avatars into beautiful people with glamorous clothes and socializing with other members’ incredibly beautiful avatars. Members devoted lifetimes to building super beautiful homes and slick bars and discos. The environment and avatars were created in full 3-D, but due to technological constraints, members could only view the world in flat 2-D on their desktop screens. (Second Life is rebooting itself as a 3-D world in 2016, code-named Project Sansa.) Avatars communicated via text balloons floating over their heads, typed by owners. It was like walking around in a comic book. This clunky interface held back any deep s
ense of presence. The main attraction of Second Life was the completely open space for constructing a quasi-3-D environment. Your avatar walked onto an empty plain, like the blank field at a Burning Man festival, and could begin constructing the coolest and most outrageous buildings, rooms, or wilderness places. Physics didn’t matter, materials were free, anything was possible. But it took many hours to master the arcane 3-D tools. In 2009 a game company in Sweden, Minecraft, launched a similar construction world in quasi-3-D, but employed idiot-easy building blocks stacked like giant Legos. No learning was necessary. Many would-be builders migrated to Minecraft.

  Second Life’s success had risen on the ability of kindred creative spirits to socialize, but when the social mojo moved to the mobile world, no phones had enough computing power to handle Second Life’s sophisticated 3-D, so the biggest audiences moved on. Even more headed to Minecraft, whose crude low-res pixelation allowed it to run on phones. Millions of members are still loyal to Second Life, and today at any hour about 50,000 avatars are simultaneously roaming the imaginary 3-D worlds built by users. Half of them are there for virtual sex, which relies more on the social component than on realism. A few years ago the founder of Second Life, Phil Rosedale, started another VR-ish company trying to harness the social opportunities of an open simulated world and to invent a more convincing VR.

  Recently I visited the offices of Rosedale’s startup, High Fidelity. As the name implies, the aim of its project is to raise the realism in virtual worlds occupied by thousands—maybe tens of thousands—of avatars at once. Create a realistic thriving virtual city. Jaron Lanier’s pioneering VR permitted two occupants at once, and the thing I noticed (and everyone else who visited) was that other people in VR were far more interesting than other things. Experimenting again in 2015, I found the best demos of synthetic worlds are ones that trigger a deep presence not with the most pixels per inch, but with the most engagement of other people. To that end, High Fidelity is exploiting a neat trick. Taking advantage of the tracking abilities of cheap sensors, it can mirror the direction of your gaze in both worlds. Not just where you turn your head, but where you turn your eyes. Nano-small cameras buried inside the headset look back at your real eyes and transfer your exact gaze onto your avatar. That means that if someone is talking to your avatar, their eyes are staring at your eyes, and yours at theirs. Even if you move, requiring them to rotate their head, their eyes continue to lock onto yours. This eye contact is immensely magnetic. It stirs intimacy and radiates a felt presence.

  Nicholas Negroponte, head of MIT’s Media Lab, once quipped in the 1990s that the urinal in the men’s restroom was smarter than his computer because it knew he was there and would flush when he left, while his computer had no idea he was sitting in front of it all day. That is still kind of true today. Laptops and even tablets and phones are largely ignorant of their owners’ use of them. That is starting to change with cheap eye tracking mechanisms like the one in the VR headsets. The newest Samsung Galaxy phone contains eye tracking technology so the phone knows precisely where on the screen you are looking. Gaze tracking can be used in many ways. It can speed up screen navigation since you often look at something before your finger or mouse moves to confirm it. Also, by measuring the duration of thousands of people’s gazes on a screen, software can generate maps that rank areas of greater or lesser attention. Website owners can then discern what part of their front page people actually look at and what parts are glanced over, and use that information to improve the design. An app maker can use gaze patterns of visitors to find which parts of an app’s interface demand too much attention, suggesting a difficulty that needs to be fixed. Mounted in a dashboard in a car, the same gaze technology can detect when drivers are drowsy or distracted.

  The tiny camera eyes that now stare back at us from any screen can be trained with additional skills. First the eyes were trained to detect a generic face, used in digital cameras to assist focusing. Then they were taught to detect particular faces—say, yours—as identity passwords. Your laptop looks into your face, and deeper into your irises, to be sure it is you before it opens its home page. Recently researchers at MIT have taught the eyes in our machines to detect human emotions. As we watch the screen, the screen is watching us, where we look, and how we react. Rosalind Picard and Rana el Kaliouby at the MIT Media Lab have developed software so attuned to subtle human emotions that they claim it can detect if someone is depressed. It can discern about two dozen different emotions. I had a chance to try a beta version of this “affective technology,” as Picard calls it, on Picard’s own laptop. The tiny eye in the lid of her laptop peering at me could correctly determine if I was perplexed or engaged with a difficult text. It could tell if I was distracted while viewing a long video. Since this perception is in real time, the smart software can adapt it to what I’m viewing. Say I am reading a book and my frown shows I’ve stumbled on a certain word; the text could expand a definition. Or if it realizes I am rereading the same passage, it could supply an annotation for that passage. Similarly, if it knows I am bored by a scene in a video, it could jump ahead or speed up the action.

  We are equipping our devices with senses—eyes, ears, motion—so that we can interact with them. They will not only know we are there, they will know who is there and whether that person is in a good mood. Of course, marketers would love to get hold of our quantified emotions, but this knowledge will serve us directly as well, enabling our devices to respond to us “with sensitivity” as we hope a good friend might.

  In the 1990s I had a conversation with the composer Brian Eno about the rapid changes in music technology, particularly its sprint from analog to digital. Eno made his reputation by inventing what we might now call electronic music, so it was a surprise to hear him dismiss a lot of digital instruments. His primary disappointment was with the instruments’ atrophied interfaces—little knobs, sliders, or tiny buttons mounted on square black boxes. He had to interact with them by moving only his fingers. By comparison, the sensual strings, table-size keyboards, or meaty drumheads of traditional analog instruments offered more nuanced bodily interactions with the music. Eno told me, “The trouble with computers is that there is not enough Africa in them.” By that he meant that interacting with computers using only buttons was like dancing with only your fingertips, instead of your full body, as you would in Africa.

  Embedded microphones, cameras, and accelerometers inject some Africa into devices. They provide embodiment in order to hear us, see us, feel us. Swoosh your hand to scroll. Wave your arms with a Wii. Shake or tilt a tablet. Let us embrace our feet, arms, torso, head, as well as our fingertips. Is there a way to use our whole bodies to overthrow the tyranny of the keyboard?

  One answer first premiered in the 2002 movie Minority Report. The director, Steven Spielberg, was eager to convey a plausible scenario for the year 2050, and so he convened a group of technologists and futurists to brainstorm the features of everyday life in 50 years. I was part of that invited group, and our job was to describe a future bedroom, or what music would sound like, and especially how you would work on a computer in 2050. There was general consensus that we’d use our whole bodies and all our senses to communicate with our machines. We’d add Africa by standing instead of sitting. We think different on our feet. Maybe we’d add some Italy by talking to machines with our hands. One of our group, John Underkoffler, from the MIT Media Lab, was way ahead in this scenario and was developing a working prototype using hand motions to control data visualizations. Underkoffler’s system was woven into the film. The Tom Cruise character stands, raises his hands outfitted with a VR-like glove, and shuffles blocks of police surveillance data, as if conducting music. He mutters voice instructions as he dances with the data. Six years later, the Iron Man movies picked up this theme. Tony Stark, the protagonist, also uses his arms to wield virtual 3-D displays of data projected by computers, catching them like a beach ball, rotating bundles of information as if they were objects.
r />   It’s very cinematic, but real interfaces in the future are far more likely to use hands closer to the body. Holding your arms out in front of you for more than a minute is an aerobic exercise. For extended use, interaction will more closely resemble sign language. A future office worker is not going to be pecking at a keyboard—not even a fancy glowing holographic keyboard—but will be talking to a device with a newly evolved set of hand gestures, similar to the ones we now have of pinching our fingers in to reduce size, pinching them out to enlarge, or holding up two L-shaped pointing hands to frame and select something. Phones are very close to perfecting speech recognition today (including being able to translate in real time), so voice will be a huge part of interacting with devices. If you’d like to have a vivid picture of someone interacting with a portable device in the year 2050, imagine them using their eyes to visually “select” from a set of rapidly flickering options on the screen, confirming with lazy audible grunts, and speedily fluttering their hands in their laps or at their waist. A person mumbling to herself while her hands dance in front of her will be the signal in the future that she is working on her computer.

  Not only computers. All devices need to interact. If a thing does not interact, it will be considered broken. Over the past few years I’ve been collecting stories of what it is like to grow up in the digital age. As an example, one of my friends had a young daughter under five years old. Like many other families these days, they didn’t have a TV, just computing screens. On a visit to another family who happened to have a TV, his daughter gravitated to the large screen. She went up to the TV, hunted around below it, and then looked behind it. “Where’s the mouse?” she asked. There had to be a way to interact with it. Another acquaintance’s son had access to a computer starting at the age of two. Once, when she and her son were shopping in a grocery store, she paused to decipher the label on a product. “Just click on it,” her son suggested. Of course cereal boxes should be interactive! Another young friend worked at a theme park. Once, a little girl took her picture, and after she did, she told the park worker, “But it’s not a real camera—it doesn’t have the picture on the back.” Another friend had a barely speaking toddler take over his iPad. She could paint and easily handle complicated tasks on apps almost before she could walk. One day her dad printed out a high-resolution image on photo paper and left it on the coffee table. He noticed his toddler came up and tried to unpinch the photo to make it larger. She tried unpinching it a few times, without success, and looked at him, perplexed. “Daddy, broken.” Yes, if something is not interactive, it is broken.

 

‹ Prev