Build Design Systems With Penpot Components
Penpot's new component system for building scalable design systems, emphasizing designer-developer collaboration.

Inside Intercom | destraynor
Voice is either a genius technology whose time has finally come, or the most overhyped waste of time we’ve seen since bots, blockchain, or winding back the clock, gamification.
The reality is less dramatic, more nuanced. There is now a new broadly available input/output interface to use and design for, and the most useful thing product and design folk can do is learn when and how that matters.
The recent emergence of Alexa, Siri, Cortana and “Okay Google” doesn’t mean voice has “finally” arrived. Quite the opposite, it means we’re finally getting going. The phase of concept demos, hype-cycles, and over-promising has ended. From here onwards, it’s real technology supporting real use cases, or pack up and go home.
There is a famed “long nose” of innovation that every significant new technology must pass through. Bill Buxton, principal researcher at Microsoft Research, has lived through every new UI form and estimates that it takes 30 years from “research project” through to full maturity (defined as generating as being a billion dollar business).
So these things will take a while, and when they arrive we shouldn’t expect them to conquer every existing input mechanism, they complement them.
New input devices don’t kill their predecessors, they stack on top of them. Voice won’t kill touchscreens. Touchscreens didn’t kill the mouse. The mouse didn’t kill the command line. Analysts yearn for a simple narrative where the birth of every new technology instantly heralds the death of the previous one, but interfaces are inherently multimodal. The more the merrier. Every new technology starts in a new underserved niche and slowly expands until it finds all the areas it’s best suited for. And voice has a great niche to start in…
Bill Buxton introduced the concept of a “place-ona”, adapting the concept of a persona (which we all love to hate) to show how a location can place limits on the type of interactions that makes sense. There is no “one best input” or “one best output”. It all depends on where you are, which in turn defines what you have free to use.
At a very simple level, humans have hands, eyes, ears and a voice. (Let’s ignore the ability to ‘feel’ vibrations as that’s alert-only for the moment). Let’s look at some real world scenarios:
Based on the above, you can see which scenarios voice UI are useful in and in general the role of voice as an input mechanism.
If you think voice UIs are the future, verbally describe, aloud, everything you see and touch on your phone today.
— Benedict Evans (@BenedictEvans) January 12, 2017
While Benedict Evans is going for his signature cocktail of insight mixed with snark in this tweet, it’s safe to say that’s not the point of voice. Or rather, voice isn’t optimal in most placeonas.
Speed and accuracy are worse with voice than they are with all other user interfaces. Yes we can talk faster than we can type but even the most advanced audio processing still relegates us to slower, over-enunciated speech and still results in errors. Secondly listening is far slower than reading, especially listening to a digital voice. We can scan and skip through text far quicker than we can listen to it. This is why Visual voicemail was such a hit (as Benedict again pointed out).
One of the killer features of the original iPhone was Visual voicemail. Dumping the audio UI was a huge step forward. How fast we forget…
— Benedict Evans (@BenedictEvans) January 12, 2017
So two things are clear:
This question has been asked at countless conference panels on the matter, and the answer is typically ‘it depends’, but I think it’s better to ask more specific questions:
Today it seems that driving and “playing music while walking around your house” lend themselves well to voice interface, but how many other scenarios will present themselves, and will the use cases move towards productivity or remain casual? Will people ever want to have their email read out through their AirPods?
The vast majority of the world can speak faster than they can type but today’s technologies can’t keep up reliably. How far is that away from changing?
While most messaging products today include asynchronous voice clips, they require that messages are received in the same way they were composed. Users have to agree on a medium for conversation which doesn’t work when they’re in different contexts. This leads to what I call the “library-driver problem”: If Michelle is in a library and Alice is driving a car, how can they communicate?
Alice is driving so can’t use her hands or eyes and Michelle can’t speak or make noise in a library. In an ideal messaging app the users can compose messages any way they want, and consume any way they want, and never does it block the conversation from happening.
Bringing voice into normal ubiquitous messaging would represent a tipping point of sorts, normalising the idea of people talking at their devices to control them.
So whilst it’s true that voice isn’t a platform, or as is often claimed the new UI paradigm, it is another new interface that we must design for and deliver on, otherwise we risk sounding like some of these folks…
The post What voice UI is good for (and what it isn’t) appeared first on Inside Intercom.
AI-driven updates, curated by humans and hand-edited for the Prototypr community