This template is constantly evolving, but when we encounter a new image that needs to be described, we typically split the transcription from the description, meaning all existing text should be copied and pasted into the UniD system, so it easily can be heard as a part of the Audio Description. That's the easy part.
The next step is remediating (aka translating) the purely visual piece of media into a purely audible form (in this case, into digital text, which can be read by screen readers or heard as Mp3 files).
One aspect of a visual image that complicates this process is its typical lack of a single narrative thread or a single meaning. Most images give everything at once (all of the possible storylines and all of the possible meanings, forcing a viewer to quickly decide on the interpretation). In other words, images can be interpreted in many ways, based on the perspective, interests, and context of the viewer.
In the case of Audio Description, though, the describer must choose that perspective to transform the media from visual to audio for the secondary listener. This choice becomes an inherent filter, which affects the reception of the description in many significant ways. If the describer and the listener are aligned on the choice, then the process might be relatively seamless. But if the describer takes a perspective that – for whatever reason – does not align with the listener, a fog of confusion easily can be created. In that respect, we suggest that describers first determine the purpose of the image. Why is it being used? What is it being used to illustrate? If you can clearly determine the purpose of the image that can help you to decide on your describing approach.
Once you have determined the purpose, and what you think this image description needs to do for the listener, I recommend a journalistic approach to Audio Description. Journalism has a long history of using texts to convey imagery and meaning. Journalists aspire to be fair and objective about what they see, by not taking sides or tilting the scales, and so should an audio describer. Journalists aim for the heart of the matter, and always tell the truth. These are all reasonable and potentially valuable positions to take on this subject.
In practice, I think, that means that the describers should start their descriptions either with a fact-focused summary, with the most-important facts first, or with a narrative approach that tells "the most important" story about the image, meaning the story that the describer has chosen to best reflect the image's purpose.
For the former style, the facts-first approach, the inverted-pyramid technique (in which the most important facts are provided in descending order of importance) has been used for hundreds of years for utilitarian purposes. It gets the job done.
For the latter, the storytelling style, which we hypothesize as the style with the most potential for creating motivating and engaging Audio Description, there has been some research (and a lot of speculation) about how mental images are formed from words and how narratives engage our minds. This type of conjuring happens all of the time, for example, in novels. But what about in description form, when a particular image exists in reality, and someone wants to hear about it, specifically?
There certainly are opportunities for poetic and creative forms of Audio Description that follow no template. We are working on just such an experiment with the National Endowment for the Arts and The Goldsworthy Walk in San Francisco. But, as a workhorse model, I propose that describers connect with the long-established journalistic traditions of Who What When Where How and Why. I think this approach will work well in this field of Audio Description, too.
For example, when the describer encounters an image of a person or people doing something (which is what most images are), the description could easily convey Who is doing What, When and Where. ... This still needs to be empirically tested, but I hypothesize a return loop then is warranted to unspool the Who (what does the Who look like, in more detail?) and the What (what does it look like, more specifically, when the Who does that thing?). At that point the How might come into play. Or the How can come later. But the When might need some further description (how do we know, from looking, that it is When), and the Where (again, how do we know, from looking, Where this image is)? Lastly, if the How already has been described in depth, the description should address the Why? Why is this person doing this thing in this time period in this place? And how? I think if a describer can do all of that, in this type of orderly manner, descriptions will be easier to understand (and also to write).
What if the image doesn't have a person? An animal might use the same approach (what's its motivation?). This approach, of course, can become quite complicated by a collage of, say, a National Park ecosystem shared by people, animals, and plant life. In some scenes we have encountered, there are dozens of potential starting points and mini-narratives to tell. The key, in those cases, is to create a strategy for your approach and then carry it all of the way through (such as, I'm going to start by describing all of the things the people do in this place; then, I'm going to describe all of the animals in action; then the plant life, or in some other order, depending on what's most important in that particular place).
A type of flower, though, would not necessarily have a motivating action to attribute (unless you are focusing on describing photosynthesis or seed spreading). Neither would an image of a piece of machinery. So for an artifact or any type of visual protagonist that does not have human or animal motivations, I suggest simply clipping out the Who (agent or actor) part of the approach and focusing instead on the What, When, and Where. What is this thing, and when and where is it at? Such a contextualization process will held to render meaning and to put the artifact into its place. A How and Why also probably exist in this scenario. So those can be teased out as well.
But what if there is no person or thing? One of the toughest challenges we have faced as describers is describing a map (check out the paper we wrote about that issue on our Research page). A map, at least theoretically, has no fact that is more important than any other and no clear narrative to tell. It does, though, have a purpose, and we recommend first identifying the purpose of the map. If you can do that, then you can probably develop a strategy to communicate that purpose. For example, maybe the map is shared to show highlights of the area, if you are a tourist, so the description would take a "highlights" approach. Or maybe the map is designed to help a person navigate a complex area, so the description would take a "navigation" approach. Or maybe the map isn't really about highlights or navigation; instead, it really just intends to show people the way it used to be, or how something was done, with no intention of the viewer of the map walking in those footsteps. If that's the case, a cultural-history approach or a natural-history approach might be the best choice.
Once all of that has been settled, the describer still needs to determine what comes first, second, third, etc., since an audible experience is linear while a visual experience is not.
To approach this part of the Audio Description challenge, we have created a template for describing that goes in this order, and in this style:
1. COMPONENT NAME: Start with the type of image, such as MAP: (we found the inclusion of MAP, and the like, to help set the stage for the listener). This label then should include the basic information to tell the listeners what they will get by selecting this description, such as the title of the image being described (if it has one), who made it (if that seems important), and the year it was created (if that seems important), and its physical location at the place (if that's relevant)
2. DESCRIBING: How would you describe the artifact you are describing? In this order: Size (small / medium / large) / Shape (horizontal / vertical / square / cut-out / oval / circle) / Type (i.e., photograph, chart, or map; see hierarchy below), distinctive characteristics (like the primary or only image on the page), and the point of view that the listener has (through what frame is this image being conveyed?) ... note only if in black and white (not if in color)
3. If multiple types of media in a package, this is the hierarchy we use to stack the descriptions (as UniD style, not based on empirical study):
A. COLLAGE / IMAGE(S) = photo or illustration /
B. MAP /
C. TIMELINE /
D. CHART /
E. QUOTE /
F. TEXT
4. If more than 1 of any of these, then signal with a label, like:
IMAGE 1 of 6 over the first one, IMAGE 2 of 6 over the second one, and so on ...
5. If only one of a kind, then just describe it ... as such:
DESCRIPTION: Description goes here
UniD Narrative Style: Who is doing what to whom, when and where and why and how?
CAPTION: Caption goes here
CREDIT: Credit goes here
RELATED TEXT: Related text goes here
NOTE: Remove all document navigation directions in the texts, which are likely to cause confusion when disassociated with the document design. For example, in the text below, I would remove "(above left)", "(above)" and "(right)."

Last updated by: Brett Oppegaard, Nov. 1, 2020