Tips for Building Voice User Interfaces

A graphical user interface (GUI) refers to the way a consumer interacts with a graphical application, such as a mobile app or a web app. Similarly, voice assistants such as the Alexa have an interface through the voice interactions you can have with it. We call this a voice user interface or VUI.

The field of user interface design on the graphical size has many decades of research and best practices. However on the voice side we are still in the early days of understanding what makes a good voice program.

The following are some of the things I have found valuable while building for voice.

Why Voice is Used

Before getting into specifics about how to design a voice interface, it's first worth going through the reasons why a user might choose to use a voice assistant. Common reasons are:

  1. To answer questions more quickly. Instead of getting out their phone to look something up, they can just ask their voice assistant.
  2. To assist with a task while their hands are busy. This could be things like:
    • Adding an ingredient to their shopping list while looking at a recipe.
    • Ordering a pizza while watching a football game.
    • Playing music while working at a computer.
  3. To give a summary of information that is relevant such as:
    • Asking about the weather while picking out clothes.
    • The news stories of the day.
    • Which holiday is coming up.

It's also worth thinking about things that voice is not used for:

If you keep in mind why your user is using the voice skill you're more likely to build a better experience. Remember to avoid building things in the way that you might in a mobile app.

With these things in mind, here are some tips to better serve the above needs:

Limit Interactions

It might seem odd for a blog post about building voice interfaces to recommend limiting interactions but it's true! Voice is a back-and-forth interface so you have to be extra cautious about getting things done as quickly as possible.

Try to limit the number of interactions with the user to 2 or 3. With Alexa these come in the form of intents. The first intent might be to open your skill, or it might be to ask a direct question.

If possible, answer the user's command immediately. You might be tempted to have a confirmation prompt before answering their command, but this just adds one more step to the process.

Avoid Extra Steps

Always be thinking about ways to achieve your task without bothering the user. You can often find sneaky ways to do so.

Above we talked about avoiding confirmation prompts. A case where that might be necessary is when the user is purchasing an item. You wouldn't want them to receive the wrong toppings on their pizza because the Alexa confused what they said.

In this case a confirmation prompt is probably necessary. However, it's always better to take direct action. You can have your cake and eat it too by directly adding the item to their cart.

This is what happens when you purchase something from Amazon. It adds it to your cart so that you can confirm and complete the purchase from the app. It also gives you the option of confirming the purchase by voice. This is completely optional though, if you say nothing then it still remains in the cart.

Taking direct actions when you are unsure of the user's request is a big way to limit the number of interactions required to use the skill.

Don't Limit Functionality

One particular pizza company has a low review score for their Alexa skill. The major complaint from users is that you can only order pizzas from a saved favorites list.

It might be tempting to release a skill with a limited number of features just to "check the box" and say that you are available on the platform. But you can't fool users and providing a bad experience will always come back to bite you.

Try to include all of the features available on other platforms in your voice skills, within reason. A user should be able to order a pizza where they:

Give Users a Reason To Come Back

Just like on any platform, you want to keep your users coming back to your skill. In the pizza example you might have daily specials that they can listen to and choose from.

It always make sense to keep in mind the reasons people use voice assistants and try to incorporate yourself into the users routine. For example, if your skill is a database about wine (and maybe you sell wines for local pickup), you could include a feature to suggest wine pairing with the meal they are planning to make that night. This provides more purchase opportunities than just when the person is thinking of purchasing themselves.

Be Fun

This one is harder to pull off, but one aspect of voice assistants users like is getting them to say funny things. Some of the most downloaded Alexa skills are games and jokes.

Don't overdue it, and definitely stay focused on the skill's core functionality, but if you can throw in a humorous comment here and there, maybe at the conclusion of some particular interaction, it will leave a smile on your user's face.


Building voice experiences becomes dramatically simpler when you focus on why the user is selecting this medium. Voice assistants are useful in situations where pulling out your phone is inconvenient. So keep interactions quick, provide useful information, and try to remain as featureful as possible.

If your company needs experienced consultants for Alexa, Google Home or any IoT project, please contact me and let's chat.