VUI Design on the wrong foot

„We need a voice application to be on Voice Assistants!“… Sooner or later it will sound in every company.

So some agencies have heard this call and pick up the appropriate budget. For applications that do not require great linguistic intelligence (e.g. questions & answers, quizzes, controls), this works quite well. However, if the application needs only a minimal amount more intelligence, the next poorly rated voice app already ends unused up in the store.

Even with the best developers in the world, it is currently not possible to produce a reasonably intelligent voice app, because the systems simply do not yet have this intelligence. In addition, the developers and designers themselves, who have not yet grasped and learnt the essence of spoken language themselves, find that traditional approaches are reflected in the production of voice apps, but in some places this is fundamentally wrong.

The fact that spoken language is different can be seen at many points. Some of them…


// The ear is the most emotional organ of perception we have. This cannot be said of the eye. How can it be? As fast as man can see, he cannot experience emotions. So if you try to land „formally“ in the user’s ear, you will not stay in the same.

// Likewise, the ear works more serially and slowly, whereas the eye can perceive information much faster almost parallel.

// Then the spoken language is also extremely situation-dependent. What and how people speak always depends on the current situation in which they find themselves.

// As if that were not enough, language is also very ambiguous and personal.


And yes, exactly these extremely important things are rarely or not at all considered in the current development of speech applications. The result is usually correspondingly bad, the budgets are burned and the customer and users frustrated.

That doesnt have to be that way, because it is sufficient to take a step back and to create an ideal-typical user (=Persona) for the planned voice application, because language depends extremely on the situation of the user.

This persona is described by the following attributes…

1.) Characteristics of the user. e.g..

  • Man/Woman?
  • Adult/Teen/Cind/Senior?
  • Age?
  • Interests?
  • Residence?
  • Language

2.) Time of the touchpoint

  • Time of day/time?
  • Good/bad mood?
  • Stress/Bored/Sleeping?

3.) Location of the touchpoint

  • At work
  • At sport
  • At cooking

Once you’ve fished out the persona, you also need a goal to manifest the use/sense/purpose of the voice app you’re going to design and to create this use with the functions of the speech assistant.

There’s a difference between „entertaining the user“ or „selling him the latest sneaker“. So, what is the goal?

If the goal and persona are known, the benefit/sense/purpose of the speech application has to be established. This happens naturally over the possibilities which the system of the language assistant gives, thus its functions which must be converted then in the Skills (Alexa) and Actions (Google) by the programmer.

If thus the system of the language assistant gives it e.g. that the language assistant e.g. sends an email, a lamp switches, a speech output makes etc. Of course, the design of the language must always be sensitive and it must be oriented so emotionally and above all to the user’s intention.

„The customer does not want a drill, he wants the hole in the wall!“ (Henry Ford)

Ultimately, the goal and the persona determine the design of the functions (skills/actions) of a speech application and thus also the benefit for the user as well as the provider!