Being Vocal – The Commandments of Voice User Interface Design

In a previous job, my focus was on wireless and voice technologies. I architected a multi-modal application platform, whose initial purpose was to support Personal Information Management across a disparate range of interaction devices. At that time I immersed myself in the world of Voice User Interface design – an area not only driven by technology, but also by psychology. In fact, the dominant factors in producing a successful voice interface are quite distant from the underlying technology. Key factors of persona, dialog flow, and how to create an engaging user experience derive a lot from human psychology.

Where am I going with this, you may well ask? Well, during some downtime at the end of last year, I put together a collection of notes that I named the “Ten Commandments of Voice User Interface Design”. These notes were a result of my education process; a process which included a diverse range of reading, attendance at some of the more popular Voice and Speech technology conferences, along with capturing my own personal experiences working with the team to build our VUI. My goal for the notes back then had been to produce a nice formalized document, turn it into a PDF, then let it loose on the web. Well, that never happened. Move forward 9 months: I was sorting through some old files on my machine, and happened to come across this same document. Therefore, rather than letting my (unedited) effort go to waste, I’ve decided to publish it here, in its raw form, for your consumption.

Feedback and comments are most welcome. Hopefully for people new to the world of VUI design, this’ll provide some interesting pointers on the road to successful Voice User Interface design.

  1. Usability testing – it’s not a big phrase, and you need to embrace it from the beginning of any voice project.
  2. If the client has an existing Interactive Voice Response (IVR) system, research it, and ask the users what could be or should be done better.
  3. The Wizard of Oz isn’t only a classic movie, but also a testing methodology through which you can tell your app is on the right road to Kansas. (e.g. SUEDE)
  4. Real testing with speech recognition doesn’t require a full production application: Prototype your dialogs and grammars using VoiceXML, and dummy up the back-end data. Your users will still have a realistic experience without you having to build the whole application.
  5. It’s not only what the test users say… make sure that when you’re observing test users running tasks on your application that you do not only listen to them, but watch them as well. Recording test sessions on Closed-Circuit Televison may provide feedback that cannot be garnered from audio recordings alone.
  6. Record what the users think happened, and analyze what actually did happen. Post-task questionnaires are a great way of collecting a user’s feedback on how they thought a task progressed. Capturing call flows, utterances, and system error feedback is also a great way to see how the call actually went. Use the two together to improve your design. Questionnaires provide quick feedback to high-level managers about how their potential users think the system is progressing. Call analysis provides system designers with great feedback on how users are reacting to the system.
  7. Give users the opportunity to provide free-form feedback. A couple of ways to do this:
    • In a questionnaire, offer the user some space for their general comments about the system. You may capture useful data here that is outside the scope of your set of questions.
    • Where the application permits, give the user a chance to record a feedback message at the end of their call. This may have a bias, as only those users who get to that stage, or chose the menu option will be exposed to this process – but this is still a useful channel for feedback
  8. Application evaluation is an ongoing process, and usability evaluation is an ongoing process. Once the system reaches pilot, and, even, production status, there is still great room for improvement. Implement review processes that capture both user feedback and sample interactions against the system. Perform grammar reviews to ensure your grammars capture the users’ standard range of commands.
  9. Remember: You are guiding the user. Do not give the user the impression the system can do more than it actually is capable. Be very careful with open-ended questions (“What can I do for you today?”) unless you have a well-defined grammar, and some great back-tracking logic. Also, open-ended questions place the onus on the caller to say the right thing. You can end up with quickly disappointed users unless your grammars are great or your error handling is appropriate.
  10. Error handling is crucial. If a user does not understand their options, or is distracted, you need to negotiate with them their status, what their options are, and what they can do next.
  11. It is not only what the user hears the system saying… but it is also the non-speech audio that counts. Audible cues can provide great support to users in providing them understanding about where they are in the system (landmarks), or when the system is expecting input. Advanced users can be prompted about barge-in capability through use of a simple alerting tone.
  12. Persona does matter – survey results have indicated that different voices can instill different feelings within the users of a system. As part of your usability testing, try out a few dialogs with different personas, and record characteristic preferences about each of the personas.
  13. The terminology that management uses may never be the same as the terminology used by customers. Build the system for the people who will use it – the customers! – and make sure that your grammars are built with their language-set in mind.
  14. Customer Service Representatives (CSRs) are a fountain of knowledge. In an existing environment, monitoring of the most frequent activities provides a great source of ideas for automation. In addition, CSRs can provide information about user information patterns: the ways in which users usually provide information (e.g. when booking a travel ticket, always providing the destination first, then usually being prompted for the point of origination).
  15. Map out your users mental model. This process is key to developing a system that is compatible with your users. Activities include card sorting – writing out various activities intended to be offered by your application, and getting the users to sort them into groups. Get the users to label those groups – this may provide a valuable insight into how users view your organization, and what they would expect of an interactive system. Know as much about the target users of the system before you start to design the application. This model is crucial.
  16. Be careful with TTS: Test your users’ acceptance levels, define boundary points where usage is acceptable. Determine migration plans where TTS prompts can be replaced with recorded audio. Test in conjunction with your personas to ensure no detrimental effects. Don’t forget that there are many regionalized TTS engines out there, so if TTS has to be used, try and find a match that fits well with your users.
  17. Test users do not exist in your company, or if you’re a consulting organization building a VUI for a client, test users do not exist within your client’s company or your own. Whilst some of those people may be customers as well as employees, a degree of separation ensures that the results are unaffected by knowledge of corporate structure of terminology. i.e. get the results of the average customer.
  18. Make sure your users are aware of universal commands, especially ‘Help’, and the ability to return to a ‘Main Menu’. This should be an integral part of any error management / backtracking system. Do not place users in a state where they feel they cannot escape.
  19. 0 is typically always used to drop through to the operator. If your users do drop through to the operator, consider capturing the reason why this happened. Or, at least, capture enough information to review why the user did not proceed in using the speech interface. Again, great usability feedback.
  20. When getting users to test the system, task ordering can be important. While some randomization (or controlled randomization – Latin Square Design) is a good thing, it may be worth keeping some tasks to the end. Making the task to drop down to the operator the first task may have a serious affect on the rest of your results.
  21. Avoid cognitive overload – guide your users as necessary – don’t make them forget what they’re supposed to be doing, or what options they have.
  22. Start testing early. Real early. See Wizard of Oz.
  23. Voice verification can offer a high security environment, without requiring the user to say lots of information. Think about the potential for up-selling/cross-selling, extra personalization, and reduction of fraudulent activity.
  24. When testing new systems with existing customers, considering setting up ‘bogus’ accounts – users will feel more comfortable knowing that they’re testing without their personal data being exposed to third parties.
  25. The world is a wireless place – make sure that when you are testing, you are not only using people on landlines, but from a wide range of sources. Have a fair share of people calling from landlines (corded and cordless phones), from cellphones, and also, where appropriate, from overseas. Make sure that the application works right, and also make sure that the speech recognition rate is acceptable.
  26. Diagramming applications (e.g. Visio) are your friend. When mapping out voice dialog flows, in addition to using a standard white board, why not project Visio onto the board using a projector. You can quickly capture and rearrange flows during design meetings, and instantly have them in a distributable form.
  27. Too much of something can be a bad thing – while you may want to offer every single one of your services through a voice interface, that may be a bad idea. Too many options can lead to a very confusing system to the user, especially if only 20% of the choices are used 80% of the time. Look to eliminate complex, low volume functionality where appropriate.
  28. What’s the time? Well, if it’s Friday night, it might actually be Saturday morning. Watch out for the 12am timing window. A calendar classes tomorrow as beginning at 12am. Most humans class it as when they wake up. If you’re accepting times and relative dates from users, factor this into your design, and test out your assumptions in your usability tasks.
  29. Don’t be negative about errors, especially when it’s your users who experience them. Make an error scenario (a no match, no input) a positive experience, give the user reinforcing information about what is expected of them at that stage, and the available options.
  30. While it’s good to confirm, it isn’t good to confirm everything one step at a time – unnecessary confirmation could double the length of your user’s dialog. Use features such as n-best and stop lists to be intelligent about interpreting input, then ask for confirmation where necessary.
  31. Usability is great, but unless you do something with the results, it is worthless. Usability testing and the design process should be intrinsically linked. Without one, the other is useless. Feed information from your usability testing into your design process. Retest the usability of improved designs.
  32. Regionalization, Localization, Internationalization. Be aware of the requirements for your application to serve people from more than one location, or more than one country, speaking more than one language. This will not only affect your grammars and prompts, but could impact your dialog flow as well. Test with appropriate groups, to capture essential feedback.

I certainly cannot take credit for all the ideas mentioned here. These ideas and thoughts are derived from a collective body of research that goes back many years; research that still continues to be refined to this day. And, while I’m less involved in voice-related solutions today, I still find it a fascinating application of technology.

Finally, if you’re just getting into the world of Voice User Interfaces – and their associated technologies VoiceXML, SALT, and CCXML for Call Control – you might want to check out the following vendors who provide development communities: Voxeo, Voxpilot, BeVocal, and Tellme.

This entry was posted in Technology. Bookmark the permalink.

7 Responses to Being Vocal – The Commandments of Voice User Interface Design

  1. Stephen Bull says:

    Commandments should be limited to approximately 10. Commandments should be brief. Your’s seem to preach to the choir rather than the congregation of those who only know that the Wizard of Oz is a movie. The google link is clever but only makes matters worse because the cognoscenti brought forth don’t explain the Wizard of Oz procedure. Et cetera.

  2. Jason Brome says:

    Absolutely in agreement. Original plan had been to refine the content down to something more manageable. Unfortunately other constraints inhibited that process. Therefore, rather than letting the content disappear into the ether, I thought it would be more appropriate to capture it, in its raw form, for future review.
    Fortunately, however, this content is open to refactoring and a process of continous improvement. If you have any suggestions for a suitable Wizard of Oz tutorial, I’d love to update the link to point to it.

  3. Jason Brome says:

    Wizard of Oz link updated to point to J Kelley’s own description of the term. Also added a link to the SUEDE project, a very interesting implementation of a Wizard of Oz testing tool from the Group for User Interface Research at Berkeley.

  4. With regards to visio, did you create your own stencils or did you find stencils that would be useful for dialogs? I haven’t found any that quite meet our needs.

  5. Jason Brome says:

    I’m afraid we never really got too advanced with our modeling of VUIs within Visio. It was mostly a combination of standard flow-chart symbols for mapping out the high-level VUI, along with additional textual markup for capturing meta information.
    I’ve seen a few vendors who have custom Visio stencils that directly link into their application runtime, but haven’t seen any productized VUI-focused stencils for the purpose of design. If you come across anything, I’d be very interested to hear about it.

  6. John Sala says:

    I’m seeking particpants for 2 2hour discussion in Berkeley California Thursday Morning January 29th.
    All must be currently involved in a VoiceXML project. The purpose is improvement of these tools. All will receive $150. Cash honorrarium.
    For Details Call (510) 482-2524 ASk for John Sala

  7. Phil Shinn says:

    There’s a VUID yahoo group over at
    join the party!

Comments are closed.