Know This
Page 35
The Twin Tides of Change
Timo Hannay
Founder, SchoolDash; managing director, Digital Science (Macmillan); co-organizer, Sci Foo Camp
News stories are by their nature ephemeral. Whipped up by the media (whether mass or social), they soon dissipate, like ripples on the surface of the sea. More significant and durable are the great tides of social change and technological progress on which they ride. It is these that will continue to matter for generations to come. Fortunately, like real tides, they tend to be predictable.
One such inexorable trend is our changing relationship with the natural world—most vividly represented by the ongoing debate about whether humanity’s impact has been so profound as to justify the christening of a new geological epoch: the Anthropocene. Whether or not a consensus emerges in the next few years, it will do so eventually, for our effect on the planet will only grow. This is in part because our technological capabilities continue to expand, but an even more important driver is our evolving collective psyche.
Since Darwin showed us that we are products of the natural world rather than its divinely appointed overlords, we’ve been reluctant to fully impose our will, fearful of our own omnipotence and concerned that we’ll end up doing more harm than good. But we might as well be trying, Canute-like, to hold back the sea. For whatever is beyond the pale today will eventually come to seem so natural that it will barely register as news—even if it takes the death of the old guard to usher in a new way of thinking. To future generations, genetic engineering of plants and animals (and humans) will seem as natural as selective breeding is today, and planetary-scale geoengineering will become as necessary and pervasive as the construction of dams and bridges.
As for our place in nature, so too for our relationship with technology. Recent progress in artificial intelligence and bionics, in particular, have led to a lot of soul-searching about who—or what—is in charge, and even what it means to be human. The Industrial Revolution saw machines replace human physical labor, but now that they are replacing mental labor too, what’s left for people to do? Even those who don’t fear for their jobs might be angry when they discover that their new boss is an algorithm.
Yet since the invention of the wheel, humans have lived in happy and productive symbiosis with their technologies. Despite our ongoing appetite for scare stories, we’ll continue to embrace innovations as the primary source of improvements in our collective well-being. In doing so, we’ll come to see them as natural extensions of ourselves—indeed, as enablers and enhancers of our humanity—rather than as something artificial or alien. A life lived partly in virtual reality will be no less real than one seen through a pair of contact lenses. Someone with a computer inserted in their brain rather than merely in their pocket will be seen as no less human than someone with a pacemaker. And traveling in a vehicle or aircraft without a human at the controls will be seen not as reckless but as reassuring. We are surely not too far from the day when Edge will receive its first contribution from a genetically enhanced author or an artificial intelligence. That, too, will be big news, but not for long.
Thus, humanity is subject to two inexorably rising tides: a scientific and technological one in which the magical eventually becomes mundane, and a psychological and social one in which the unthinkable becomes unremarkable. News stories will come and go like breaking waves; meanwhile, beneath them, inconspicuous in their vastness, the twin tides of technological and social change will continue their slow but relentless rise, testing and extending the boundaries of human knowledge and acceptance. This will be the real story of our species and our age.
Imaging Deep Learning
Andy Clark
Philosopher and cognitive scientist, University of Edinburgh; author, Surfing Uncertainty
The world is increasingly full of deep architectures—multilevel artificial neural networks employed to discover (via deep learning) patterns in large data sets, such as images and texts. But the power and prevalence of these deep architectures mask a major problem—the problem of knowledge opacity. Such architectures learn to do wonderful things, but they do not (without further coaxing) reveal just what knowledge they are relying upon when they do them.
This is both disappointing (theoretically) and dangerous (practically). Deep learning and the patterns it extracts now permeate every aspect of our daily lives, from online search and recommendation systems to bank-loan applications, healthcare, and dating. Systems that have that much influence over our destinies ought to be as transparent as possible. The good news is that new techniques are emerging to probe the knowledge gathered and deployed by deep-learning systems.
In June 2015, Alexander Mordvintsev and co-authors published online a short piece entitled “Inceptionism: Going Deeper into Neural Networks.” Named after a specific architecture, “Inceptionism” was soon trending on just about every geeky blog in the universe. The authors took a trained-up network capable of deciding what is shown in a given image. They then devised an automatic way to get the network to enhance an input image in ways that would tweak it toward an image that would be classified, by that network, as some specific item. This involved essentially running the network in reverse (hence the frequent references to “networks dreaming” and “reverse hallucination” in the blogs). For example, starting with random noise and a target classification, while constraining the network to respect the statistical profiles of the real images it had been trained on, the result would be a vague, almost impressionistic image that reveals how the network thinks that kind of item (“banana,” “starfish,” “parachute,” or whatever) should look.
There were surprises. The target “barbell,” for example, led the network to hallucinate two-ended weights all right—but every barbell still had the ghostly outline of a muscular arm attached. That tells us that the network has not quite isolated the core idea yet, though it got pretty close. Most interesting, you can now feed in a real image, pick one layer of your multilevel network, and ask the system to enhance whatever is detected. This means you can use inceptionism to probe and visualize what’s going on at each processing layer. Inceptionism is thus a tool for looking into the network’s multilevel mind, layer by layer.
Many of the results were psychedelic—repeated enhancements at certain levels resulted in images of fractal beauty, mimicking trippy artistic forms and motifs. This was because repeating the process results in feedback loops. The system is (in effect) being asked to enhance whatever it sees in the image as processed at some level. So if it sees a hint of birdiness in a cloud, or a hint of faceness in a whirlpool, it will enhance that, bringing out a little more of that feature or property. If the resulting enhanced image is then fed in as input, and the same technique applied, those enhancements make the hint of birdiness (or whatever) even stronger, and another round of enhancement ensues. This rapidly results in some image elements morphing toward repeating, dreamlike versions of familiar things and objects.
If you haven’t yet seen these fascinating images, you can check them out online in the “inceptionism gallery,” and even create them using the code available in DeepDream. Inceptionist images turn out to be objects of beauty and contemplation in their own right, and the technique may thus provide a new tool for creative exploration—not to mention suggestive hints about the nature of our own creative processes. But this is not just, or even primarily, image-play. Such techniques are helping us understand what kinds of things these opaque, multilevel systems know—what they rely upon layer by layer as their processing unfolds.
This is neuroimaging for the artificial brain.
The Neural Net Reloaded
Jamshed Bharucha
Psychologist; president emeritus, Cooper Union
The neural network has been resurrected. After a troubled sixty-year history, it has crept into the daily lives of hundreds of millions of people, in the span of just three years.
In May 2015, Sundar Pichai announced that Google had reduced errors in speech rec
ognition to 8 percent, from 23 percent only two years earlier. The key? Neural networks, rebranded as deep learning. Google reported dramatic improvements in image recognition just six months after acquiring DNN Research, a startup founded by Geoffrey Hinton and two of his students. Back-propagation is back—with a Big Data bang. And it’s suddenly worth a fortune.
The news wasn’t on the front pages. There was no scientific breakthrough. Nor was there a novel application. Why is it news? The scale of the impact is astonishing, as is the pace at which it was achieved. Making sense of noisy, infinitely variable, visual and auditory patterns has been a Holy Grail of artificial intelligence. Raw computing power has caught up with decades-old algorithms. In just a few short years, the technology has leapt from laboratory simulations of oversimplified problems to cell-phone apps for the recognition of speech and images in the real world.
Theoretical developments in neural networks have been mostly incremental since the pioneering work on self-organization in the 1970s and back-propagation in the 1980s. The tipping point was reached recently not by fundamentally new insights but by processing speeds that make possible larger networks, bigger data sets, and more iterations.
This is the second resurrection of neural networks. The first was the discovery by Hinton and Yann LeCun that multilayered networks can learn nonlinear classification. Before this breakthrough, Marvin Minsky and Seymour Papert had all but decimated the field with their 1969 book Perceptrons. Among other things, they proved that Frank Rosenblatt’s perceptron could not learn classifications that are nonlinear.
Rosenblatt developed the perceptron in the 1950s. He built on foundational work in the 1940s by McCulloch and Pitts, who showed how patterns could be handled by networks of neurons, and Donald Hebb, who hypothesized that the connection between neurons is strengthened when connected neurons are active. The buzz created by the perceptron can be relived by reading “Electronic ‘Brain’ Teaches Itself,” in the July 13, 1958, New York Times. The Times quoted Rosenblatt as saying that the perceptron “will grow wiser as it gains experience,” adding that “the Navy said it would use the principle to build the first Perceptron ‘thinking machines’ that will be able to read or write.”
Minsky and Papert’s critique was a major setback, if not a fatal one, for Rosenblatt and neural networks. But a few people persisted quietly, among them Stephen Grossberg, who began working on these problems while an undergraduate at Dartmouth in the 1950s. By the 1970s, Grossberg had developed an unsupervised (self-organizing) learning algorithm that balanced the stability of acquired categories with the plasticity necessary to learn new ones.
Hinton and LeCun addressed Minsky and Papert’s challenge and brought neural nets back from obscurity. The excitement about back-propagation drew attention to Grossberg’s model, as well as to the models of Fukushima and Kohonen. But in 1988, Steven Pinker and Alan Prince did to neural nets what Minsky and Papert had done two decades earlier, with a withering attack on the worthiness of neural nets for explaining the acquisition of language. Once more, neural networks faded into the background.
After Hinton and his students won the ImageNet challenge in 2012, with a quantum improvement in performance on image recognition, Google seized the moment, and neural networks came alive again.
The opposition to deep learning is gearing up already. All methods benefit from powerful computing, and traditional symbolic approaches also have demonstrated gains. Time will tell which approaches prevail, and for what problems. Regardless, 2012–2015 will have been the time when neural networks placed artificial intelligence at our fingertips.
Differentiable Programming
David Dalrymple
Computer scientist, neuroscientist; research affiliate, MIT Media Lab
Over the past few years, a raft of classic challenges in artificial intelligence which stood unmet for decades were overcome almost without warning, through an approach long disparaged by AI purists for its “statistical” flavor: It was essentially about learning probability distributions from large volumes of data rather than examining humans’ problem-solving techniques and attempting to encode them in executable form. The formidable tasks it has solved range from object classification and speech recognition to generating descriptive captions for photos and synthesizing images in the style of famous artists—even guiding robots to perform tasks for which they were never programmed!
This newly dominant approach, originally known as “neural networks,” is now branded “deep learning,” to emphasize a qualitative advance over the neural nets of the past. Its recent success is often attributed to the availability of larger data sets and more powerful computing systems, or to large tech companies’ sudden interest in the field. These increasing resources have indeed been critical ingredients in the rapid advancement of the state of the art, but big companies have always thrown resources at a wide variety of machine-learning methods. Deep learning in particular has seen unbelievable advances; many other methods have also improved, but to a far lesser extent.
So what is the magic that separates deep learning from the rest and can crack problems for which no group of humans has ever been able to program a solution? The first ingredient, from the early days of neural nets, is a timeless algorithm, rediscovered again and again, known in this field as “back-propagation.” It’s really just the chain rule—a simple calculus trick—applied in an elegant way. It’s a deep integration of continuous and discrete math, enabling complex families of potential solutions to be autonomously improved with vector calculus.
The key is to organize the template of potential solutions as a directed graph (e.g., from a photo to a generated caption, with many nodes in between). Traversing this graph in reverse enables the algorithm to automatically compute a “gradient vector,” which directs the search for increasingly better solutions. You have to squint at most modern deep-learning techniques to see any structural similarity to traditional neural networks; behind the scenes, this back-propagation algorithm is crucial to both old and new architectures.
But the original neural networks using back-propagation fall far short of newer deep-learning techniques, even using today’s hardware and data sets. The other key piece of magic in every modern architecture is another deceptively simple idea: Components of a network can be used in more than one place at the same time. As the network is optimized, every copy of each component is forced to stay identical (this idea is called “weight-tying”). This enforces an additional requirement on weight-tied components: They must learn to be useful in many places all at once, not specialize to a particular location. Weight-tying causes the network to learn a more generally useful function, since a word might appear at any location in a block of text, or a physical object might appear at any place in an image.
Putting a generally useful component in many places of a network is analogous to writing a function in a program and calling it in multiple spots—an essential concept in a different area of computer science, functional programming. This is more than just an analogy: Weight-tied components are actually the same concept of reusable function as in programming. And it goes even deeper! Many of the most successful architectures of the past couple of years reuse components in exactly the same patterns of composition generated by common “higher-order functions” in functional programming. This suggests that other well-known operators from functional programming might be a good source of ideas for deep-learning architectures.
The most natural playground for exploring functional structures trained as deep-learning networks would be a new language that can run back-propagation directly on functional programs. As it turns out, hidden in the details of implementation, functional programs are compiled into a computational graph similar to what back-propagation requires. The individual components of the graph need to be differentiable too, but Grefenstette et al. recently published differentiable constructions of a few simple data structures (stack, queue, and deque), which suggests that further differentiable implementations ar
e probably just a matter of clever math. More work in this area may open up a new programming paradigm—differentiable programming. Writing a program in such a language would be like sketching a functional structure with the details left to the optimizer; the language would use back-propagation to automatically learn the details according to an objective for the whole program—just like optimizing weights in deep learning but with functional programming as a more expressive generalization of weight-tying.
Deep learning may look like another passing fad, in the vein of “expert systems” or “Big Data.” But it’s based on two timeless ideas—back-propagation and weight-tying—and although differentiable programming is a new concept, it’s a natural extension of those ideas which may prove timeless itself. Even as specific implementations, architectures, and technical phrases go in and out of fashion, these core concepts will continue to be essential to the success of AI.