Book Read Free

The Inevitable

Page 21

by Kevin Kelly


  These examples can only hint at the outburst and sheer frenzy of new forms appearing in the coming decades. Take any one of these genres and multiply it. Then marry and crossbreed them. We can see the nascent outlines of the new ones that might emerge. With our fingers we will drag objects out of films and remix them into our own photos. A click of our phone camera will capture a landscape, then display its history in words, which we can use to annotate the image. Text, sound, motion will continue to merge. With the coming new tools we’ll be able to create our visions on demand. It will take only a few seconds to generate a believable image of a turquoise rose, glistening with dew, poised in a trim golden vase—perhaps even faster than we could write these words. And that is just the opening scene.

  The supreme fungibility of digital bits allows forms to morph easily, to mutate and hybridize. The quick flow of bits permits one program to emulate another. To simulate another form is a native function of digital media. There’s no retreat from this multiplicity. The number of media choices will only increase. The variety of genres and subgenres will continue to explode. Sure, some will rise in popularity while others wane, but few will disappear entirely. There will still be opera lovers a century from now. But there will be a billion video game fans and a hundred million virtual reality worlds.

  The accelerating fluidity of bits will continue to overtake media for the next 30 years, furthering a great remixing.

  * * *

  • • •

  At the same time, the cheap and universal tools of creation (megapixel phone cameras, YouTube Capture, iMovie) are quickly reducing the effort needed to create moving images and upsetting a great asymmetry that has been inherent in all media. That is: It is easier to read a book than to write one, easier to listen to a song than to compose one, easier to attend a play than to produce one. Feature-length classic movies in particular have long suffered from this user asymmetry. The intensely collaborative work needed to coddle pieces of chemically treated film and paste them together into movies meant that it was vastly easier to watch a movie than to make one. A Hollywood blockbuster can take a million person-hours to produce and only two hours to consume. To the utter bafflement of the experts who confidently claimed that viewers would never rise from their reclining passivity, tens of millions of people have in recent years spent uncountable hours making movies of their own design. Having a ready and reachable audience of potential billions helps, as does the choice of multiple modes in which to create. Because of new consumer gadgets, community training, peer encouragement, and fiendishly clever software, the ease of making video now approaches the ease of writing.

  This is not how Hollywood makes films, of course. A blockbuster film is a gigantic creature custom built by hand. Like a Siberian tiger, it demands our attention—but it is also very rare. Every year about 600 feature films are released in North America, or about 1,200 hours of moving images. As a percentage of the hundreds of millions of hours of moving images produced annually today, 1,200 hours is minuscule. It is an insignificant rounding error.

  We tend to think the tiger represents the animal kingdom, but in truth a grasshopper is a truer statistical example of an animal. The handcrafted Hollywood film is a rare tiger. It won’t go away, but if we want to see the future of motion pictures, we need to study the swarming critters below—the jungle of YouTube, indie films, TV serials, documentaries, commercials, infomercials, and insect-scale supercuts and mashups—and not just the tiny apex of tigers. YouTube videos are viewed more than 12 billion times in a single month. The most viewed videos have been watched several billion times each, more than any blockbuster movie. More than 100 million short video clips with very small audiences are shared to the net every day. Judged merely by volume and the amount of attention the videos collectively garner, these clips are now the center of our culture. Their craftsmanship varies widely. Some are made with the same glossiness as a Hollywood movie, but most are made by kids in their kitchen with a phone. If Hollywood is at the apex of the pyramid, the bottom is where the swampy action is, and where the future of the moving image begins.

  The vast majority of these non-Hollywood productions rely on remixing, because remixing makes it much easier to create. Amateurs take soundtracks found online, or recorded in their bedrooms, cut and reorder scenes, enter text, and then layer in a new story or novel point of view. Remixing of commercials is rampant. Each genre often follows a set format.

  For example, remixed movie trailers. Movie trailers themselves are a recent art form. Because of their brevity and compact storytelling, movie trailers can be easily recut into alternative narratives—for instance, movie trailers for imaginary movies. An unknown amateur may turn a comedy into a horror flick, or vice versa. Remixing the soundtrack of the trailer is a common way to mash up these short movies. Some fans create music videos made by matching and mixing a pop song soundtrack with edited clips from obscure cult hit movies. Or they clip scenes from a favorite movie or movie star, which are then edited to fit an unlikely song. These become music videos for a fantasy universe. Rabid fans of pop bands will take their favorite songs on video and visibly add the song’s lyrics in large type. Eventually these lyric videos became so popular that some bands started releasing official music videos with lyrics. As the words float over visuals in sync with the sounds, this is a true remixing and convergence of text and image—video you read, music you watch.

  Remixing video can even become a kind of collective sport. Hundreds of thousands of passionate anime fans around the world (meeting online, of course) remix Japanese animated cartoons. They clip the cartoons into tiny pieces, some only a few frames long, then rearrange them with video editing software and give them new soundtracks and music, often with English dialogue. This probably involves far more work than was required to draw the original cartoon, but far less work than it would have required to create a simple clip 30 years ago. The new anime vids tell completely new stories. The real achievement in this subculture is to win the Iron Editor challenge. Just as in the TV cookoff contest Iron Chef, the Iron Editor must remix videos in real time in front of an audience while competing with other editors to demonstrate superior visual literacy. The best editors can remix video as fast as you might type.

  In fact, the habits of the mashup are borrowed from textual literacy. You cut and paste words on a page. You quote verbatim from an expert. You paraphrase a lovely expression. You add a layer of detail found elsewhere. You borrow the structure from one work to use as your own. You move frames around as if they were phrases. Now you will perform all these literary actions on moving images, in a new visual language.

  An image stored on a memory disk instead of celluloid film has a liquidity that allows it to be manipulated as if the picture were words rather than a photo. Hollywood mavericks like George Lucas embraced digital technology early (Lucas founded Pixar) and pioneered a more fluent way of filmmaking. In his Star Wars films, Lucas devised a method of moviemaking that has more in common with the way books and paintings are made than with traditional cinematography.

  In classic cinematography, a film is planned out in scenes; the scenes are filmed (usually more than once); and from a surfeit of these captured scenes, a movie is assembled. Sometimes a director must go back and shoot “pickup” shots if the final story cannot be told with the available film. With the new screen fluency enabled by digital technology, however, a movie scene is something more malleable—it is like a writer’s paragraph, constantly being revised. Scenes are not captured (as in a photo) but built up incrementally, like paint, or text. Layers of visual and audio refinement are added over a crude sketch of the motion, the mix constantly in flux, always changeable. George Lucas’s last Star Wars movie was layered up in this writerly way. To get the pacing and timing right, Lucas recorded scenes first in crude mock-ups, and then refined them by adding more details and resolution till done. Lightsabers and other effects were digitally painted in, layer by layer. Not a sing
le frame of the final movie was left untouched by manipulation. In essence, his films were written pixel by pixel. Indeed, every single frame in a big-budget Hollywood action film today has been built up with so many layers of additional details that it should be thought of as a moving painting rather than as a moving photograph.

  In the great hive mind of image creation, something similar is already happening with still photographs. Every minute, thousands of photographers are uploading their latest photos on websites and apps such as Instagram, Snapchat, WhatsApp, Facebook, and Flickr. The more than 1.5 trillion photos posted so far cover any subject you can imagine; I have not yet been able to stump the sites with an image request that cannot be found. Flickr offers more than half a million images of the Golden Gate Bridge alone. Every conceivable angle, lighting condition, and point of view of the Golden Gate Bridge has been photographed and posted. If you want to use an image of the bridge in your video or movie, there is really no reason to take a new picture of this bridge. It’s been done. All you need is a really easy way to find it.

  Similar advances have taken place with 3-D models. On the archive for 3-D models generated in the software SketchUp, you can find insanely detailed three-dimensional virtual models of most major building structures of the world. Need a street in New York? Here’s a filmable virtual set. Need a virtual Golden Gate Bridge? Here it is in fanatical detail, every rivet in place. With powerful search and specification tools, high-resolution clips of any bridge in the world can be circulated into the common visual dictionary for reuse. Out of these ready-made “phrases” a film can be assembled, mashed up from readily available clips or virtual sets. Media theorist Lev Manovich calls this “database cinema.” The databases of component images form a whole new grammar for moving images.

  After all, this is how authors work. We dip into a finite database of established words, called a dictionary, and reassemble these found words into articles, novels, and poems that no one has ever seen before. The joy is recombining them. Indeed, it is a rare author who is forced to invent new words. Even the greatest writers do their magic primarily by remixing formerly used, commonly shared ones. What we do now with words, we’ll soon do with images.

  For directors who speak this new cinematographic language, even the most photorealistic scenes are tweaked, remade, and written over frame by frame. Filmmaking is thus liberated from the stranglehold of photography. Gone is the frustrating method of trying to capture reality with one or two takes of expensive film and then creating your fantasy from whatever you get. Here reality, or fantasy, is built up one pixel at a time as an author would build a novel one word at a time. Photography exalts the world as it is, whereas this new screen mode, like writing and painting, is engineered to explore the world as it might be.

  But merely producing movies with ease is not enough, just as producing books with ease on Gutenberg’s press did not fully unleash text. Real literacy also required a long list of innovations and techniques that permitted ordinary readers and writers to manipulate text in ways that made it useful. For instance, quotation symbols make it simple to indicate where one has borrowed text from another writer. We don’t have a parallel notation in film yet, but we need one. Once you have a large text document, you need a table of contents to find your way through it. That requires page numbers. Somebody invented them in the 13th century. What is the equivalent in video? Longer texts require an alphabetic index, devised by the Greeks and later developed for libraries of books. Someday soon with AI we’ll have a way to index the full content of a film. Footnotes, invented in about the 12th century, allow tangential information to be displayed outside the linear argument of the main text. That would be useful in video as well. And bibliographic citations (invented in the 13th century) enable scholars and skeptics to systematically consult sources that influence or clarify the content. Imagine a video with citations. These days, of course, we have hyperlinks, which connect one piece of text to another, and tags, which categorize using a selected word or phrase for later sorting.

  All these inventions (and more) permit any literate person to cut and paste ideas, annotate them with her own thoughts, link them to related ideas, search through vast libraries of work, browse subjects quickly, resequence texts, refind material, remix ideas, quote experts, and sample bits of beloved artists. These tools, more than just reading, are the foundations of literacy.

  If text literacy meant being able to parse and manipulate texts, then the new media fluency means being able to parse and manipulate moving images with the same ease. But so far, these “reader” tools of visuality have not made their way to the masses. For example, if I wanted to visually compare recent bank failures with similar historical events by referring you to the bank run in the classic movie It’s a Wonderful Life, there is no easy way to point to that scene with precision. (Which of several sequences did I mean, and which part of them?) I can do what I just did and mention the movie title. I might be able to point to the minute mark for that scene (a new YouTube feature). But I cannot link from this sentence to only those exact “passages” inside an online movie. We don’t have the equivalent of a hyperlink for film yet. With true screen fluency, I’d be able to cite specific frames of a film or specific items in a frame. Perhaps I am a historian interested in oriental dress, and I want to refer to a fez worn by someone in the movie Casablanca. I should be able to refer to the fez itself (and not the head it is on) by linking to its image as the hat “moves” across many frames, just as I can easily link to a printed reference of the fez in text. Or even better, I’d like to annotate the fez in the film with other film clips of fezzes as references.

  With full-blown visuality, I should be able to annotate any object, frame, or scene in a motion picture with any other object, frame, or motion picture clip. I should be able to search the visual index of a film, or peruse a visual table of contents, or scan a visual abstract of its full length. But how do you do all these things? How can we browse a film the way we browse a book?

  It took several hundred years for the consumer tools of text literacy to crystallize after the invention of printing, but the first visual literacy tools are already emerging in research labs and on the margins of digital culture. Take, for example, the problem of browsing a feature-length movie. One way to scan a movie would be to super-fast-forward through the two hours in a few minutes. Another way would be to digest it into an abbreviated version in the way a theatrical movie trailer might. Both these methods can compress the time from hours to minutes. But is there a way to reduce the contents of a movie into imagery that could be grasped quickly, as we might see in a table of contents for a book?

  Academic research has produced a few interesting prototypes of video summaries, but nothing that works for entire movies. Some popular websites with huge selections of movies (like porn sites) have devised a way for users to scan through the content of full movies quickly in a few seconds. When a user clicks the title frame of a movie, the window skips from one key frame to the next, making a rapid slide show, like a flip book of the movie. The abbreviated slide show visually summarizes a few-hour film in a few seconds. Expert software can be used to identify the key frames in a film in order to maximize the effectiveness of the summary.

  The holy grail of visuality is findability—the ability to search the library of all movies the same way Google can search the web, and find a particular focus deep within. You want to be able to type key terms, or simply say, “bicycle plus dog,” and then retrieve scenes in any film featuring a dog and a bicycle. In an instant you could locate the moment in The Wizard of Oz when the witchy Miss Gulch rides off with Toto. Even better, you want to be able to ask Google to find all the other scenes in all movies similar to that scene. That ability is almost here.

  Google’s cloud AI is gaining visual intelligence rapidly. Its ability to recognize and remember every object in the billions of personal snapshots that people like me have uploaded is simply uncanny. Give it a
picture of a boy riding a motorbike on a dirt road and the AI will label it “boy riding a motorbike on a dirt road.” It captioned one photo “two pizzas on a stove,” which was exactly what the photo showed. Both Google’s and Facebook’s AI can look at a photo and tell you the names of the people in it.

  Now, what can be done for one image can also be done for moving images, since movies are just a long series of still images in a row. Perceiving movies takes a lot more processing power, in part because there is the added dimension of time (do objects persist as the camera moves?). In a few years we’ll be able to routinely search video via AI. As we do, we’ll begin to explore the Gutenberg possibilities within moving images. “I consider the pixel data in images and video to be the dark matter of the Internet,” says Fei-Fei Li, director of the Stanford Artificial Intelligence Laboratory. “We are now starting to illuminate it.”

  As moving images become easier to create, easier to store, easier to annotate, and easier to combine into complex narratives, they also become easier to be remanipulated by the audience. This gives images a liquidity similar to words. Fluid images flow rapidly onto new screens, ready to migrate into new media and seep into the old. Like alphabetic bits, they can be squeezed into links or stretched to fit search engines and databases. Flexible images invite the same satisfying participation in both creation and consumption that the world of text does.

 

‹ Prev