by Matt Garrish
CSS3 Speech
You might be thinking the global definition power of PLS lexicons combined with the granular override abilities of SSML might be sufficient to cover all cases, so why a third technology? But you’d be only partly right.
The CSS3 Speech module is not about word pronunciation, however. It includes no phonetic capabilities, but defines how you can use CSS style sheet technology to control such aspects of synthetic speech rendering as the gender of voice to use, the amount of time to pause before and after elements, when to insert aural cues, etc.
The CSS3 Speech module also provides a simpler entry point for some basic voicing enhancements. The ability to write X-SAMPA or IPA pronunciations requires specialized knowledge, but the speak-as property masks the complexity for some common use cases.
You could use this property to mark all acronyms that are to be spelled out letter-by-letter, for example. If we added a class called ‘spell’ to the abbr elements we want spelled, as in the following example:
IBM
we could then define a CSS class to indicate that each letter should be voiced individually using the spell-out value:
.spell { -epub-speak-as: spell-out }
It’s no longer left to the rendering engine to determine whether the acronym is “wordy” enough to attempt to voice as a word now.
Note
Note that the properties are all prefixed with “-epub-” because the Speech module was not a recommendation at the time that EPUB 3 was finalized. You must use this prefix until the Speech module is finalized and reading systems begin supporting the unprefixed versions.
The speak-as property provides the same functionality for numbers, ensuring they get spoken one digit at a time instead of as a single number, something engines will not typically do by default.
.digits { -epub-speak-as: digits }
Adding this class to the following number would ensure that readers understand you’re referring to the North American emergency line when listening to TTS playback:
911
The property also allows you to control whether or not to read out punctuation. Only some punctuation ever gets announced in normal playback, as it’s generally used for pause effects, but you could require all punctuation to be voiced using the literal-punctuation value:
.punctuate { -epub-speak-as: literal-punctuation }
This setting would be vital for grammar books, for example, where you would want the entire punctuation for each example to be read out to the student. Conversely, to turn punctuation off you’d use the no-punctuation value.
The speak-as property isn’t a complex control mechanism, but definitely serves a useful role. Even if you are fluent with phonetic alphabets, there’s a point where it doesn’t make sense to have to define or write out every letter or number to ensure the engine doesn’t do the wrong thing, and this is where the Speech module helps.
Where the module excels, however, is in providing playback control. But this is also an area where you may want to think twice before adding your own custom style sheet rules. Most reading systems typically have their own internal rules for playback so that the synthetic speech rendering doesn’t come out as one long uninterrupted stream of monotone narration. When you add your own rules, you have the potential to interfere with the reader’s default settings. But in the interests of thoroughness, we’ll take a quick tour.
The first stop is the ability to insert pauses. Pauses are an integral part of the synthetic speech reading process, as they provide a non-verbal indication of change. Without them, it wouldn’t always be clear if a new sentence were beginning or a new paragraph, or when one section ends and another begins.
The CSS3 Speech module includes a pause property that allows you to control the duration to pause before and after any element. For example, we could define a half-second pause before headings followed by a quarter-second pause after by including the following rule:
h1 { -epub-pause: 50ms 25ms }
Aural cues are equally helpful when it comes to identifying new headings, as the pause alone may not be interpreted by the listener as you expect. The Speech module includes a cue property for exactly this purpose:
h1 { -epub-pause: 50ms 25ms; -epub-cue: url('audio/ping.mp3') none }
(Note that the addition of the none value after the audio file location. If omitted, the cue would also sound after the heading was finished.)
And finally, the rest property provides fine-grained control when using cues. Cues occur after any initial pause before the element (as defined by the pause property), and before any pause after. But you may still want to control the pause that occurs between the cue sounding and the text being read and between the end of the text being read and the trailing cue sounding (i.e., so that the sound and the text aren’t run together). The rest property is how you control the duration of these pauses.
We could update our previous example to add a 10 millisecond rest after the cue is sounded to prevent run-ins as follows:
h1 { -epub-pause: 50ms 25ms; -epub-cue: url('audio/ping.mp3') none; -epub-rest: 10ms 0ms }
But again, if I didn’t say it forcefully enough earlier, it’s best not to tweak these properties unless you’re targeting a specific user group, know their needs, and know that their players will not provide sufficient quality “out of the box.” Tread lightly, in other words.
A final property, that is slightly more of an aside, is voice-family. Although not specifically accessibility related, it can provide a more flavorful synthesis experience for your readers.
If your ebook contains dialogue, or the gender of the narrator is important, you can use this property to specify the appropriate gender voice. We could set a female narrator as follows:
body { -epub-voice-family: female }
and a male class to use as needed for male characters:
.male { -epub-voice-family: male }
If we added these rules to a copy of Alice’s Adventures in Wonderland, we could now differentiate the Cheshire Cat using the male voice as follows:
Alice: But I don't want to go among mad people.
The Cat: Oh, you can't help that. We're all mad here. I'm mad. You're mad.
You can also specify different voices within the specified gender. For example, if a reading system had two male voices available, you could add some variety to the characters as follows by indicating the number of the voice to use:
.first-male { -epub-voice-family: male 1 } .second-male { -epub-voice-family: male 2 }
At worst, the reading system will ignore your instruction and only present whatever voice it has available, but this property gives you the ability to be more creative with your text-to-speech renderings for those systems that do provide better support.
Note
The CSS3 Speech module contains more properties than I’ve covered here, but reading systems are only required to implement the limited set of features described in this section. You may use the additional properties the module makes available (e.g., pitch and loudness control), but if you do your content may not render uniformly across platforms. Carefully consider using innovative or disruptive features in your content, as this may hinder interoperability across reading systems.
Whatever properties you decide to add, it is always good practice to separate them into their own style sheet. You should also define them as applicable only for synthetic speech playback using a media at-rule as follows:
@media speech { .spell { -epub-speak-as: spell-out } }
As I noted earlier, reading systems will typically have their own defaults, and separating your aural instructions will allow them to be ignored and/or turned off on systems where they’re unwanted.
For completeness, you should also indicate the correct media type for the style sheet when linking from your content document:
And that covers the full range of
synthetic speech enhancements. You now have a whole arsenal at your disposal to create high-quality synthetic speech.
The Coded Word: Scripted Interactivity
Whether you’re a fan of scripted ebooks or not, EPUB 3 has opened the door to their creation, so we’ll now take a look at some of the potential accessibility pitfalls and how they can be avoided.
One of the key new terms you’ll hear in relation to the use of scripting in EPUB 3 is progressive enhancement. The concept of progressive enhancement is not original to EPUB, however, nor is it limited to scripting. I’ve actually been making a case for many of its other core tenets throughout this guide, such as separation of content and style, content conveying meaning, etc. Applied in this context, however, it means that scripting must only enhance your core content.
We’ve already covered why structure and semantics should carry all the information necessary to understand your content, but that presupposes that it is all available. The ability for scripts to remove access to content from anyone without a JavaScript-enabled reading system is a major concern not just for persons using accessible devices, but for all readers.
And that’s why scripting access to content is forbidden in EPUB 3. If you try to circumvent the specification requirement and treat progressive enhancement as just an “accessibility thing,” you’re underestimating the readership that are going to rely on your content rendering properly without scripting. Picture buying a book that has pages glued together and you’ll get an idea of how excited your readers will be that you thought no one would notice.
Note
Note that it’s not a truism that you can expect JavaScript support in EPUB 3 reading systems. There will undoubtedly be widespread support for scripting in time, but support is an optional feature that vendors and developers can choose to ignore.
Meeting the general requirement to keep your text accessible is really not asking a lot, though. As soon as you turn to JavaScript to alter (or enable) access to prose, you should realize you’re on the wrong path. To this end:
Don’t include content that can only be accessed (made visible) through scripted interaction.
Don’t script-enable content based on a reader’s preferences, location, language, or any other setting.
Don’t require scripting in order for the reader to continue moving through the content (e.g., choose your own adventure books).
Whether or not your prose can be accessed is not hard to test, even if it can’t be done reliably by validators like epubcheck. Turn off JavaScript and see if you can navigate your document from beginning to end. You may not get all the bells and whistles when scripting is turned off, but you should be able to get through with no loss of information. If you can’t, you need to review why prose is not available or has disappeared, why navigation gets blocked, etc., and find another way to do what you were attempting.
Don’t worry that this requirement means all the potential scripting fun is being taken out of ebooks, though. Games and puzzles and animations and quizzes and any other secondary content you can think of that requires scripting are all fair game for inclusion. But when it comes to including these there are two considerations to make, very similar to choosing when to describe images:
Does the scripted content you’re embedding include information that will be useful to the reader’s comprehension (demos, etc.), or is it included purely for pleasure (games)?
Can the content be made accessible in a usable way and can you provide a fallback alternative that provides the same or similar experience?
The answer to the first question will have some influence how you tackle the second. If the scripted content provides information that the reader would otherwise not be able to obtain from the prose, you should consider other alternative forms for making that information available, for example:
If you script an interactive demo using the canvas element, consider also providing a transcript of the information for readers who may not be able to interact with it.
If you’re including an interactive form that automatically evaluates the reader’s responses, also include access to an answer key.
If you’re adding a problem or puzzle to solve, also provide the solution so the reader can still learn the steps to its completion.
None of the above suggestions are intended to remove the responsibility to try and make the content accessible in the first place, though. Scripting of accessible forms, for example, should be a trivial task for developers familiar with WAI-ARIA (we’ll look at some practices in the coming section). But trivial or not, because scripting will not necessarily be available, it’s imperative that you provide other means for readers to obtain the full experience.
If the scripted content is purely for entertainment purposes, however, create a fallback commensurate with the value of that content to the overall ebook (if it absolutely cannot be made accessible natively). Like decorative images, a reader unable to interact with non-essential content is not going to be hugely interested in reading a five-page dissertation on each level of your game. A simple idea of what it does will usually suffice.
A Little Help: WAI-ARIA
Although fallbacks are useful when scripting is not available, you should still aim to make your scripted content accessible to all readers. Enter the W3C Web Accessibility Initiative’s (WAI) Accessible Rich Internet Application (ARIA) specification.
The technology defined in this specification can be used in many situations to improve content accessibility. We’ve already encountered the aria-describedby attribute in looking at how to add descriptions and summaries, for example.
I’m now going to pick out three common cases for scripting to further explore how ARIA can enhance the accessibility of EPUBs: custom controls, forms, and live regions.
Custom Controls
Custom controls are not standard form elements that you stylize to suit your needs, just to be clear. Those are the good kinds of custom controls—if you want to call them custom—as they retain their inherent accessibility traits whatever you style them to look like. Readers will not have problems interacting with these controls as they natively map to the underlying accessibility APIs, and so will work regardless of the scripting capabilities any reading system has built in.
A custom control is the product of taking an HTML element and enhancing it with script to emulate a standard control, or building up a number of elements for the same purpose. Using images to simulate buttons is one of the more common examples, as custom toolbars are often created in this way. There is typically no native way for a reader using an accessible device to interact with these kinds of custom controls, however, as they are presented to them as whatever HTML element was used in their creation (e.g., just another img element in the case of image buttons).
It would be ideal if no one used custom controls, and you should try to avoid them unless you have no other choice, but the existence of ARIA reflects the reality that these controls are also ubiquitous. The increase in native control types in HTML5 holds out hope for a reduction in their use, but it would be neglectful not to cover some of the basics of their accessible creation. Before launching out on your own, it’s good to know what you’re getting into.
Note
There are widely available toolkits, like jQuery, that bake ARIA accessibility into many of the custom widgets they allow you to create. You should consider using these if you don’t have a background in creating accessible controls.
If you aren’t familiar with ARIA, a very quick, high-level introduction for custom controls is that it provides a map between the new control and the standard behaviors of the one being emulated (e.g., allowing your otherwise-inaccessible image to function identically to the button element as far as the reader is concerned). This mapping is critical, as it’s what allows the reader to interact with your controls through the underlying accessibility API. (The ARIA specification includes a graphical depiction that can help visualize this process.)
Or, put
differently, ARIA is what allows the HTML element you use as a control to be identified as what it represents (button) instead of what it is (image). It also provides a set of attributes that can be set and controlled by script to make interaction with the control accessible to all. As the reader manipulates your now-identifiable control, the changes you make to these attributes in response get passed back to the underlying accessibility API. That in turn allows the reading system or assistive technology to relay the new state on to the reader, completing the cycle of interaction.
But to get more specific, the role an element plays is defined by attaching the ARIA role element to it. The following is a small selection of the available role values you can use in this attribute:
alert
button
checkbox
dialog
menuitem
progressbar
radio
scrollbar
tab
tree
Here’s how we could now use this attribute to define an image as a audio clip start button: