Consider this photo. What is it?
Very few people would be able to tell that the object in the photo is a fossil. Even fewer know that the fossil is of a swift-swimming turtle from Germany. Certain archaeologists, however, would understand it immediately. Their visual experience of this image, as a result of their knowledge, is richer and more certain. Most of us have that sort of richness and certainty only when we see a fossil to which we can attach meaning, such as the one below:
Notice how our visual experience of the first fossil feels very different from our visual experience of the second. That is because we have knowledge of fish, but not of German swift-swimming turtles, especially in fossils. We can imagine that to the infant, almost all visual experiences feel like our experience of the first fossil rather than of the second.
Here are two more objects for which we don’t have a sufficient knowledge for true vision:
Wire-stretching device similar to the type found in some hardware stores.
Corneal epithelial stem cells of the kind transplanted into May’s eye (as seen under electron microscope)
To the infant, who lacks knowledge of the world and its objects, the universe must seem a vast collection of these colorful, meaningless shapes, a panorama of fossils and strange tools. One can imagine that it must feel overwhelming to the infant, this overflow of visual information he cannot even begin to sort.
How is the very young child to make sense of this jumble of visual data? How is he to translate these shapes into three dimensions and give them meaning, to make them more than just a collection of colorful blobs? How is he to build the knowledge of the world and its objects that is so essential to vision? It’s not as if anyone can explain it to him.
There is only one way for the very young child to do this. He must interact with the things he sees. He must experiment with them, investigate them, explore them, probe them, play with them, touch, taste, smell, and hear them. He must handle everything, manipulate everything, go to and reach for everything. He must make his nursery his laboratory, a place in which his endless tests and trials with things—especially by touch—lead to a knowledge of their textures, shapes, purposes, and functions, to an understanding of their natures. Without that constant and direct interaction and experimentation with things, he cannot begin to form his set of assumptions about the world and its objects. Without touching a glass of water—tipping it, dropping it, breaking it, spilling it, shaking it, hearing its sounds, watching its water levels, observing the change of light and shade on its side as it’s lifted, seeing it used to take a drink—he can’t know a glass of water as anything but just a shape, a random fossil, and he might not even interpret it as a shape in three dimensions. In the laboratory of the young child, it makes perfect sense to eat carpet because that’s yet another way to know the carpet’s visual nature.
Interaction with the world and its objects is so critical during early development that without it a person might never see properly. In a landmark experiment, Richard Held and Alan Hein at MIT raised two kittens in total darkness. For a short period during each day the kittens were placed in baskets that hung just above the ground from opposite ends of a pole. Holes were cut in one basket to allow one kitten’s paws to reach the ground, but not in the other. The apparatus was constructed such that when one basket moved, the other moved identically.
When the lights were turned on, the kitten in the basket with holes was allowed to run along the ground, causing its basket and its mate’s basket to move identically, and giving each of them an identical visual experience.
At the end of the experiment, only the kitten that had been allowed to actively move the basket could move around the world using vision. The passive kitten—though it had exactly the same visual experience—was functionally blind. Seeing the world passively was not enough; interaction with the world was essential for vision to be useful.
Other studies show similar results. There is evidence that people brought up in iron lungs—in which their interaction with objects is severely restricted—cannot see things properly beyond the range of their body movements, beyond what they could touch. In one strange case, a boy was raised in a pawnshop and surrounded by all manner of objects. To ensure that he didn’t touch the items or meddle with their tags, he was kept in a playpen. When removed, he simply could not judge distances beyond what he’d experienced by touch in the playpen.
The lesson in all this is clear: interaction with the world and its objects is critical to building the knowledge necessary for useful vision.
Mike May had that crucial interaction with the world. He had built his knowledge and formed his set of assumptions. By the time of his accident, at age three, he could see nearly as well as an adult.
Yet after his stem cell transplant surgery, he found himself with a strange and different kind of vision. He could perceive motion and color almost perfectly, but he could not make sense of faces, perceive things in depth (unless they were moving), or readily recognize objects. What explains this dichotomy? Is there something about perceiving motion and color that’s different from perceiving faces, depth, and objects? And how does that relate to knowledge?
Ione Fine set out to answer that question by examining how people learn to perceive these things. She believed the answer might go a long way toward explaining May’s vision. And maybe toward helping him improve.
FACE PERCEPTION
To most people, human faces appear distinctly unique, the most personal and nuanced objects in the world. In reality, they are very similar to one another. Differences of just a millimeter or two in symmetry or space between the eyes or in the eyebrow curvature or cheekbone angle or forehead height can make two quite similar faces look vastly different.
Animal faces are distinguished by the same kinds of tiny variations. Yet to us, chimpanzee faces look alike and sheep faces seem identical. Why do we see such profound difference in human faces but not in animal faces?
The answer lies in learning. Through intense practice that begins in early childhood, we make ourselves into experts on human faces.
Practice and learning are everything. That’s why shepherds can identify their sheep by faces—they’ve practiced all their lives with sheep faces, and now they’re experts. And that’s why people sometimes struggle to distinguish among faces from different ethnic or age groups—they haven’t sufficiently practiced and interacted with them.
Practice with human faces doesn’t just help a person identify and recognize faces. It also makes it possible to judge a person’s gender, read her expressions, assess her interest in us, predict her mood. Often, the difference between a smile and a frown is just a tiny change in the angle of—or even the shadow on—the corner of the mouth. A one-millimeter shift in the pupils of a person standing across the room can tell us whether that person is looking at us or just over our shoulder; a one-millimeter shift in her eyebrow can tell us if she’s interested or angry—all this at a distance of thirty feet. People would never be able to attach meaning to those minuscule differences without the benefit of massive practice—and massive learning.
That kind of learning takes years of intense practice; children are still developing their face-perception skills at five or six years of age.
DEPTH PERCEPTION
When we open our eyes, a two-dimensional image falls on our retinas. Yet we perceive the world robustly and in three dimensions; its depth feels absolutely real to us, not at all a trick of the brain. How does that happen? How do we translate our flat retinal images into the majestic three-dimensional world in which we move and interact so confidently?
There seem to be three kinds of clues that the visual system uses to perceive depth:
• Pictorial cues
• Motion cues
• Stereopsis
Pictorial Cues to Depth
Pictorial cues are features in a photograph or painting or other two-dimensional representation that can produce the impression of depth. They are the one
s Italian painters discovered in the early Renaissance. The most important are:
Occlusion
When an object hides another object, that object is seen as being closer.
Relative Height
The closer an object is to the horizon, the farther away the object appears.
Note that this is true both for the ships in the illustration (which are below the horizon) and for the balloons (which are above the horizon).
Cast Shadows
Shadows can indicate an object’s depth. (The two photos are identical but for the addition of shadows in the photo on the right.)
Relative Size
The farther away an equal-sized object is, the less room it will occupy on one’s field of view.
Familiar Size
Our knowledge of an object’s size affects how we perceive that object’s distance—and the distance and size of other objects around it.
The familiar size of the dolphins in the photo affects how we perceive their distance. Most of us would estimate that distance to be about ten feet. If, however, dolphins were the size of football fields, we might estimate that they were several thousand feet away in this photo. If dolphins were the size of insects, we might estimate their distance in this photo to be just a few inches.
Aerial Perspective
The air contains minuscule particles of water, dust, and pollution. The farther away an object is, the more particles we must look through, and therefore the hazier that object appears.
(Incidentally, aerial perspective doesn’t occur on the moon, which has no atmosphere and therefore no particles. Astronauts struggled to judge distance on the moon.)
Linear Perspective
Parallel lines converge on the retina as they recede in depth.
Texture Gradient
As a surface gets farther away from us, its texture gets smaller and appears smoother.
Shape from Shading
When an object has a three-dimensional shape, some surfaces will be in the light and others will be in shadow.
These are just a few examples of the cues our brains use to transpose the two-dimensional images on our retinas into the perception of a three-dimensional world. One can hardly imagine the immense amount of knowledge about the world required to process these pictorial cues to depth, and to do it instantaneously, automatically, and unconsciously.
It turns out that these pictorial cues are themselves based on knowledge—a kind of statistical knowledge about what the world is like most of the time. Such pieces of knowledge are called “priors.” They represent what we believe about the world when we come upon a new visual scene. Here are some examples:
• Adults are between five and seven feet tall.
• Light tends to fall from above.
• Physical objects create shadows.
• Certain objects are a certain color.
• The lines in our culture are often at right angles to each other (as with buildings).
Consider this photo:
The inclusion of a barn, a boat, and a creek in this photograph greatly helps us judge the windmill’s size and distance. That’s because we possess prior knowledge—that barns, boats, and creeks are almost always a certain size. If the windmill were the only object in the photo, we might think it a toy, or we might judge it to be several times larger than it really is.
How does a person go about learning these pictorial cues and priors? By now, we’ve seen that much of visual learning is done in early childhood, through constant interaction and experimentation with the world and its objects. It’s the same with learning depth. A baby reaches, crawls, observes, tests, falls short, and goes too far, constantly calibrating visual clues with its tactile experience until the two-dimensional image on the retina translates automatically into a visual experience of depth. Infants aren’t even sensitive to the pictorial cues to depth until they’re about six months old—the age at which they start grabbing for objects. After that, the process of understanding and using pictorial-depth cues takes years to perfect. The task is astoundingly difficult—engineers still can’t build a machine that can compute depth as accurately and robustly as humans compute it. Yet the child does it without any help from the parents and over just a few years—all from interacting with its environment.
Motion Cues to Depth
Pictorial cues, remember, are just one of the ways in which the visual system goes about perceiving the world in depth. Another set of cues becomes available when the observer or the object is in motion. These are known as motion cues. Two of the most important are:
Motion Parallax
Nearby objects move faster on the retina than distant objects do.
Motion parallax can be observed by watching the passing scene from inside a moving car. Nearby objects—like houses—appear to fly past, while more distant objects—like mountains—seem hardly to move at all. We perceive the faster-moving objects to be nearer to us than the ones that are moving more slowly.
Kinetic Depth Effect
The motion of a two-dimensional representation can create a perception of its three-dimensional form.
It was the kinetic depth effect that occurred when May saw a square on Fine’s computer monitor leap into three dimensions as a cube when it began rotating on-screen.
Motion cues also rely on priors, though they are much simpler. Babies learn them more quickly and at a younger age than they do the pictorial cues for depth. Babies can perceive moving objects in a few weeks. Depth in motion is understood by the age of four months or perhaps even earlier.
Stereopsis
Stereopsis creates an impression of depth by comparing the small differences in the images produced by each eye.
Look at a nearby object. Cover one eye, then the other, then the first again. The object appears to move back and forth. The brain compares those two slightly different images to compute—and then perceive—the object’s depth.
Stereopsis, of course, occurs only in people who have two working eyes, and so is not applicable to May’s case. Stereopsis is not thought to be critical to good depth perception in humans, as it is useful only for objects that are about a yard from the body or closer. Beyond that the distance between the two eyes is so small compared to the distance to the object that the images in the two eyes are essentially identical. Many people think that stereopsis is the reason people are able to see in depth, but if you shut one eye you can still reach out and pick up a coffee cup, and you can still drive. About 10 percent of the general population doesn’t have good stereoscopic vision, and even professional athletes have been known to lack it.
OBJECT RECOGNITION
Human beings must be able to recognize objects in order to interact with them. That alone requires massive learning—there’s an endless number of objects in the world to know. But it’s even harder than that. We must also recognize each of the objects in the world from every possible viewing angle. How can that be possible? Consider this picture. What does it show?
We recognize that object as an elephant. It is the most readily recognized view of an elephant—called by some its “canonical” view. Now consider the next picture. What does it show?
We also recognize this object as an elephant—despite its decidedly noncanonical view. How do we recognize something from a noncanonical view? After all, the picture above presents a very different two-dimensional form; it has a very different shape. Why don’t we see it as a different object from the one shown in the first photo?
It is thought that a primary reason we can identify an object from its various noncanonical views is because we already understand its depth from its canonical view. Once the brain understands an object’s depth—its robust and three-dimensional form—it can make the inference about how that object would look from different angles.
That ability is critical because most of the objects we encounter in our daily routines are not conveniently positioned for us at their canonical angles. And even if they were, their shapes would
change the moment they moved or we moved. Our ability to see in three dimensions allows us to understand objects from virtually any viewpoint. If a person could see only in two dimensions, he would need to learn to recognize not just the objects in the world but myriad different views of each of those objects—an impossible task.
There are many other factors involved in recognizing objects, but without depth perception the rest are moot. Object recognition, like much of depth perception, develops later in the infant and can take years to perfect.
Motion
Infants perceive motion as early as two weeks after birth. By the time they’re ten or twelve weeks old they can smoothly follow moving objects. They seem able to do this almost instinctively, without much need for experimentation or interaction with the world. It seems that motion is simply there to be perceived, and making sense of motion is a relatively easy task that seems to be nearly complete by six months of age.
Color
Infants have considerable color vision by the age of about two months. Development of color vision seems merely to depend on seeing color in the world, and doesn’t require the baby to interact with the world. Color, like motion, seems simply to be there to be seen.
This understanding of the various parts of vision raises a critical question: Is there a difference between the things May perceives well (motion and color) and the things he perceives poorly (faces, depth, and objects)?
Crashing Through Page 25