Understanding Context
Page 24
Magritte’s picture of a pipe is, indeed, not actually a pipe. But how do we know? In part, because our perception-action loop can tell that it’s just a picture: it’s contained on a flat surface; it’s a realistic depiction, but not nearly enough to be construed as a photograph; and it has a caption written underneath it, on the same surface plane.
The final test: the viewer’s body can’t actually pick it up, put tobacco in it, and smoke from it. That is, what makes it not a pipe is largely its lack of pipe-related affordances.
Figure 13-3. Magritte’s The Treachery of Images (© Herscovici, Brussels/Artists Rights Society [ARS], New York)
And yet, the caption strikes the new viewer as strange. “Well, of course it’s a pipe, what does he mean?” The conceit of Magritte’s painting works because we don’t go around separating semantic and physical information every day. We often use them as if they were interchangeable. As in the duck-rabbit example in Chapter 12, this pipe is something about which we normally say, “That’s a pipe,” rather than “That’s a painting that depicts the physical aspects of an object with affordances that are related to a category of objects we call ‘pipe’.” We’d get very little done in the human world if we had to talk that way.
Digital technology has the ability to take this natural tendency toward conflation and use it to simulate physical objects and their behaviors. I can’t smoke this pipe, but on a digital device’s screen, I can “press” a pixel-rendered picture of a “button” to make the system do something. It’s not a physical button, but within the rules of a digitally enabled interface, it might as well be. It’s a picture of a pipe, but one I can smoke. It presents semantic information on a display in such as way as to behave as if it were a physical object. In other words, this is semantic function, simulating physical affordance.
An example of a big button lots of people use is in the Shazam mobile app; the app’s main purpose is to help users figure out what songs are playing, wherever they happen to be. To do that, it presents a simulated-affording structure—a picture that looks like a button, as depicted in Figure 13-4. The drop-shadow and slight gradient make it appear like a raised, convex object, similar to many other physical buttons we encounter in our surroundings. Of course, Shazam also adds a bit of text helping to nudge the user toward understanding that this is not merely a decorative picture, but an interactive object. Interestingly, if you touch anywhere on the screen other than the button, nothing happens. The button is only a picture, but it presents itself as a physical object that demands we aim specifically for it—a way for this digital agent to make an educated guess that we want it to listen to a song.
Figure 13-4. The Shazam mobile app’s primary control is a big simulated button that, when touched, scans the environment for musical patterns
An on-screen button is just one way semantic information can simulate physical affordances, making them “interactive”—a word that wasn’t in prevalent use until the 1970s and 80s, when technology enabled previously inert parts of our environment to become things that, when we poked them, starting poking us back.
Interaction, and the design of interactions, largely originated with the emergence of electronic devices. And digital technology enabled those devices to richly simulate interfaces in ways that physical buttons on electrical gadgets couldn’t achieve.
Figure 13-5. Google’s Ngram Viewer shows usage of “interactive” climbing fast along with the rise of computing interfaces
Now, we have very complex interfaces full of many nested layers of objects. There are debates as to how closely to their physical counterparts these objects and their controls should be rendered. On one side of the spectrum, there’s the skeuomorphic approach, which presents a literal translation of the physical surfaces and objects we encounter in the physical world. Sometimes, that copying can be gratuitous and unnecessary. Other times, it’s important to re-create the physicality of analog objects, to provide familiar clues into learning the system’s simulated affordances.
For example, with the Animoog app, users can play with an interface simulating an old Moog synthesizer—a culturally significant device whose form is a big part of the experience of using it. Abstracting those wonderfully retro controls (see Figure 13-6) into something less literal would disrupt the entire purpose of playing a simulated Moog. And yet, because the knobs are not physical, there is no “twist” gesture that feels like using real knobs. Instead, the interface relies on an up/down finger “scrub” gesture to make the setting go higher or lower. Simulation on glass digital screens can go only so far.
Figure 13-6. Animoog simulates the controls of a small Moog synthesizer
In recent years, there’s been a raging debate in the design community about skeuomorphic versus flat design. The watershed moment was Apple’s release of iOS 7, which went far into the direction of these so-called flat interface surfaces and objects.
Unfortunately, this polarity is misleading, because these are not a binary choice; the design approaches are on more of a spectrum. In Figure 13-7, which compares iOS 6 to iOS 7, notice the big difference between the Cancel and Send signifiers. Both versions simulate the same action, but iOS 7 has eschewed the faux-convex-recessed-button treatment. The reason why Apple is able to remove the button simulation is because most users have learned, through repeated exposure and growing convention, that the words “Cancel” and “Send,” in this situated context, are action verbs that signify their respective functions. Outside of this context, these words could convey many other messages, but in software this sort of placement has an established, near-invariant meaning.
Figure 13-7. The Apple iOS 6 interface (left) compared to iOS 7 (right)
But notice that the Add button with the “+” symbol has a circle around it, giving a clear boundary that makes it more a button than the “+” character alone would do. Evidently, designers at Apple realized that the “+” wasn’t enough by itself. Also note how iOS 7 adds a camera icon, making it possible for users to more quickly send a picture when messaging. It doesn’t have three-dimensional gradients, but it’s definitely an iconic signifier, mimicking the shape of a physical object, which is probably a better, more compact signifier than “Add Picture.” Of course, given the changing form of photography devices, the conventional camera icon might soon become an anachronism, the way the floppy-disc icon (for “Save”) has become today. Culture and language infuse all interfaces, where ingredients mix into semiotic stews of semantic interaction.
Interestingly, many users were having trouble with the new label-only controls, so Apple added an “Accessibility” option to turn on “Button Shapes,” like the gray button-like shape shown in Figure 13-8. It clarifies the simulated affordance by subtly mimicking the physical information we associate with mechanical buttons.
Figure 13-8. An option in iOS 7 adds button shapes
Part of what this illustrates is that the context of what a simulated control does and how our bodies should interact with it, is fundamentally a linguistic question. Why? Because the information on the screen is all semantic in one way or another; it’s either simulating objects, or presenting text for reading. It’s using signifiers that are “drawn” on a surface. Whether a label or shape does enough work to signify its function is a matter of learned environmental convention, just like the meaning of words and sentences. Digital information is behind this transformation of our environment, and therefore the transformation of our users’ contextual experience.
In the examples presented in Figure 13-9, we see a sort of spectrum of how objects use invariants to inform our behavior, from physical to semantic.
Figure 13-9. A range of physical to semantic invariant cues[266]
Here are the significant characteristics of each object (from left to right):
The stairs are directly perceived by our bodies as affording upward motion. Their affordance is intrinsic to their invariant structure, nested within a building, nested within a city. And in keeping with the multifac
eted way we perceive nestedness, the stairs’ “meaning” to our bodies changes when we want to go down them or sit on one to look at a book for a moment.
The Engine Start button also has intrinsic structure that, with experience, we’ve learned means it’s an object that can be pressed into its surrounding surface. But that’s as far as the intrinsic, physical affordance goes here. Without pressing it, we don’t know what pressing this button ultimately does, unless we read the label—a signifier with semantic function, supplementing the object, informing us how this object is actually nested within a broader system that is otherwise invisible to us.
The old version of the Windows Start button is similar to the Engine Start button in every way, except that it isn’t a physical button (and in this case, not one we touch with our fingers). It’s using graphical information to simulate the contours of conventional, physical buttons. It’s also connected to a vastly more complex environment than what we find in even the newest automotive ignition systems.
Last, we have a hyperlink on a bookstore’s website. Just as we would use for the Windows Start button, there’s a cartoon hand icon that functions as an avatar for the user’s own hand (that is, linguistically, it’s behaving as an actual iconic signifier), indicating (that is, acting also as an indexical signifier) where a click of a mouse will engage the digital surface. That surface has other signifiers on it—the words in the menu. These words equate to buttons because of learned conventions, such as the fact that a web layout with a list of words placed around the edges of the interface is most often a menu meant for navigating that environment. It may also use a convention of color-change when the user’s hand-avatar (cursor) hovers over the label, as it does here. The same words elsewhere—outside a context that aligns with the menu layout convention—might not be recognized as hyperlinks. Labels used as links put a lot of pressure on the semantic context of the label, to signify not only its interactive nature, but also where it will take us or what it will do when we tap or click it.
This isn’t a comprehensive spectrum; it just shows how information can range from the simply intrinsic physical to the highly abstract semantic. Designing such objects based mainly on concerns of aesthetics and style runs the risk of ignoring the most important challenges of simulating affordance through semantic function.
Unlike our interactions with physical objects, such as opening a kitchen drawer or swinging a hammer, we never directly control what software does; it’s always mediated through layers of abstraction. The illusion of touching a thing in software and seeing it respond to our touch is a construct of semantic function, not a direct physical affordance. So, the cause-and-effect rules of software aren’t perceivable the way we can see the action-and-reaction of physical objects and events, and they’re not readable the way semantic rules are expressed in documents. We tap a button labeled X and hope the system will do X; what happens behind the scenes—how is X is defined, and does X also mean Y and Z—is up to the system.
Email serves as a good example of how semantic function can approach but not quite touch the physical affordances of actual objects and surfaces. As James J. Gibson argued, a mailbox is really a complex, compound invariant. Its function provides for more than just putting an object into another object—we perceive it as part of a cultural system of postal mail, so its function is to send a letter to an addressee. But perceiving that function depends on having learned a great deal about one’s cultural environment and what one can expect of it.
Digital technology takes the physical affordances of how mailboxes and mail work and fully abstracts them into something made entirely of language—rule-based functions presented as metaphorical labels and iconography. Email has an “inbox” and “outbox,” but there are no physical boxes. Not unlike the way digital clocks have virtually replaced the intricacies of clockwork-driven timepieces, email dissolves formerly complex physical systems into the abstractions of software.
Email gives us valuable abilities that we didn’t have before, but it also loses some of the qualities we enjoy by working with physical mail objects. Especially now that we’re so overwhelmed with the stuff, some innovators are trying to reintroduce a bit of physicality to the work of managing our email.
In 2012, America Online (AOL) started rolling out Alto, a new email platform (Figure 13-10). One of the principles behind its design is something called stacks that design lead Bill Wetherell says were inspired by noticing how his wife sorted physical mail from the post office:
She would separate it into piles based on the type of mail it was, such as catalogues, correspondence, coupons, and bills.
She’d place each pile in a part of the home where it was most relevant and likely to be used for its main purpose (coupons in the kitchen, for example).
Figure 13-10. A sample of the simulated physical-stack approach for the Alto email platform
This is a textbook example of extended cognition: making use of the environment not just for sorting the mail, but arranging those objects within the home environment in such a way that they afford further action for managing their contents.
Wetherell says, “We started to wonder if we could re-create that same physical process but in the digital world.” So, Alto presents—as one of the main interaction modes—graphically simulated stacks of mail. It also tries to sniff out what stack incoming email should go to so that the application can handle a lot of that piling for the user.[267]
There are some interesting challenges in trying to replicate these physical behaviors within an email application.
Recognizing the difference between real mail and junk mail
We’re pretty good at quickly sizing up what sort of postal-service-delivered mail we’re holding in our hands, because the physical qualities of the objects provide a lot of physical information to work from. Catalogs are heftier than personal letters or retail flyers. Envelopes are obviously handwritten versus bulk-printed.
In email, all those physical cues disappear, and we’re left with the representational semantic information deliverable by SMTP servers and readable by email software. This not only makes it harder to recognize what each incoming item really is by just seeing it in our inbox, but also provides even fewer cues for quickly sizing up each piece of mail. This is one reason why Alto tries to sort the mail behind the scenes: a software algorithm can actually be more literate in the digital-information-based cues of digital mail than people can.
Stacking is cognition
In stacking physical mail, we touch each piece and think about it just enough to do some prework with it—a crucial part of the cognitive work we do with physical objects, especially when organizing them for later action. I know that when I’ve done this, it helps me to remember what’s in each stack even days later, without having to go through each item in the stack again. By having the system automatically sort the stacks behind the scenes, the user misses this opportunity.
Stacks in physical places
As mentioned earlier, Wetherell observed that his wife would place the mail in various parts of the house, where they were more likely to be processed later, related to their larger corresponding physical tasks. That is, these stacks are nested within other environmental structures and physical activities. A digital interface interrupts this house-as-categorizer function. Of course, it’s possible our homes could eventually be pervasively digitized—with bedside tables and countertops where we can “put” stacks of our sorted, simulated mail. But that’s probably more complicated than it’s worth.
These are certainly not criticisms of Alto, which is to be applauded for tackling the challenges of improving how we use email. Alto is just a useful example of how a digital interface can go only so far in replicating the cognition-extending affordances of the physical environment. Allowing the user to work with simulated stacks is a worthwhile experiment. Think of how useful this capability would be in any search interface, for example, where you could pile and mark items the way college students can do in a
library’s carrels and stacks.
Modes and Meaning
For interface design, a “mode” is a condition that changes the result that the same action would have under a different condition. Complex machines and digital devices often have modes so that users can accomplish more things with the same number of controls. Modes are important to context because they establish rules of cause and effect for user action. They literally change the context of an object, or even a place.
A simple example of this is the Caps Lock key on a typewriter or computer keyboard, similar to the one depicted in Figure 13-11. When it’s engaged, it remains in a mode that makes all the keys create capital versions of their corresponding letters; when it’s disengaged, it goes back to lowercase letter typing. We make mistakes with this mode all the time, so much so that software password fields often remind us after a few tries that we have our caps lock engaged, which might be causing password failure.
Figure 13-11. A MacBook Caps Lock key, with its “mode on” indicator lit
Mode is a big challenge when designing understandable contexts. As technology becomes more complex, we ask a limited number of affording objects (keys, buttons, and so on) to do more kinds of work. In early word processors, pressing a particular key would cause the keyboard to change from a text input device to a text formatting or file management device. It worked like the Caps Lock key but affected a more radical shift of function. This sort of mode is known as sticky—it persists after you engage it.