This needs more empirical research, but we have adopted and adapted Art Beyond Sight's Verbal Description guidelines (full guidelines linked), which argue for the following, in this order:
First, the basic information, including the title of the work, the artist’s name, the medium, maybe the year it was done, maybe where it can be seen.
Then, describe the artifact as an object of observation (what does it look like as an image), including its shape, dimensions, and point of view.
Then, describe the content of the image. Who is doing what to whom, when and where and why and how?
In what order should that information be presented? This is a matter of significant debate.
Marza Ibanez (2010, p. 147) suggests the content unspools in this order for dynamic AD (we are not aware of ordering studies in static AD, so we think ordering needs more research):
1. The spatio-temporal setting is recognized first, in the Where: The setting, and also the spatial relationships between characters. In other words, set the scene, and the When: When is this visual happening (during a specific event, sometime in history, during an unarticulated time in the recent past?)
2. Who: Who are the characters in the scene? What do they look like? What facial and corporeal expressions are they making? What types of clothing are they wearing?
3. What: What action is happening here. What are these people doing, and why?
4. How: How are they carrying this out; with what sort of intentionality and attitudes.
Marzà Ibáñez, A. (2010). Evaluation criteria and film narrative. A frame to teaching relevance in audio description. Perspectives: Studies in Translatology, 18(3), 143-153.