Voice technology is not exactly under the radar. Most have become acquainted with voice technology in the form of the voice activated assistants that the world’s biggest tech companies have launched in recent years – Amazon’s Alexa, Apple’s Siri and Alphabet’s Google Assistant. However, it’s probably fair to say that voice technology also isn’t on the radar. At least, not on a level commensurate to the influence experts believe it will have in the way we interact with our environment day-to-day at home, at work and in public.
For many, voice technology is probably something we still associate with that friend or guy at work that always has the latest gadgetry. There’s always that guy. At the risk of gender stereotyping, it is usually a guy. The one that 15 years ago would have walked around all day with his Bluetooth earpiece in, even in the bar or while sitting down to lunch. The first to have an iPad – mainly used as a high tech, expensive newspaper while visiting the little boys’ room. Probably has a Fitbit whether he is a keen exerciser or not. Don’t you know how crucial it is to know how many steps you’ve taken and your sleep patterns? Don’t dwell on the fact no action is taken to increase that step count or improve sleep patterns. Very likely to sauntered into the office with an iWatch as soon as they were released because checking social media notifications on a smartphone isn’t nearly as convenient.
That’s the guy that now has an Alexa, Google Assistant or regularly has a chat with Siri on his iPhone. He will extol the virtues of a voice activated assistant to anyone willing to reluctantly listen. And if no one is, he’ll probably just tell Siri.
It can be easy to poke fun at ‘early adopters’, especially when they are always the early adopter. Many, if not most, new tech trends die out when it becomes clear the problem the tech is solving isn’t really a problem, they aren’t really solving it or the improvement isn’t incremental enough to justify an expense or adapting behavioural patterns. Others are relatively quickly overtaken by newer, superior generations of technology that have a longer shelf-life.
But voice technology isn’t expected to be one of them. Rather, most tech experts and analysts, as well as the companies and investors pouring hundreds of millions into the technology, believe that voice technology will represent not a trend but a paradigm shift in our relationship with technology.
Where is voice technology currently at in its development path, why will it prove to be so influential, what are the concerns around the technology, who are the sector’s main players and what are their ambitions?
Voice Technology – The Story So Far
Adoption of voice-activated assistants is most advanced on the other side of the Atlantic in the USA. Data provided by advertising and marketing media eMarketer shows that 35.6 million Americans used a standalone voice-activated assistant such as Alexa at least once a month in 2017. That figure represents growth of almost 130% on 2016 and swelled to 60.5 million also including voice assistant software included in smartphones. That’s over 27% of smartphone users with ‘older millennials’ aged between 25 and 34 the demographic that most makes use of the technology. By 2021 the number of people using voice activated assistants is predicted to have reached 1.8 billion worldwide.
From a market value of an estimated $3 billion last year and $5.21 billion this year, it is expected to grow to $15.79 billion by 2021. That’s a phenomenal rate of growth and demonstrates how quickly the integration of voice technology into our everyday private, professional and public lives is now expected to advance.
Technology based on voice recognition has been around for some time now. As far back as the 1950s, Bell Labs, founded by Alexander Graham Bell of inventing the telephone fame and now owned by Nokia, invented a machine named Audrey that was able to recognise the spoken digits 0-9 with 90% accuracy. By the 1980s, IBM’s Tangora was able to recognise up to 20,000 English words, albeit spoken slowly and clearly, and a handful of full sentences.
It wasn’t until 1997 that a voice-recognition technology, a software called NaturallySpeaking developed by Dragon Solutions, was able to recognise something approaching natural, continuous speech. A pause between each word was not necessary and the software could understand up to 100 words a minute. The software, in a greatly upgraded format, is still actually on the market and popular with doctors who use it for notation.
However, it was advances in AI, specifically Machine Learning, that took natural speech recognition to a new level. Cloud computing enabled big data also means that machine learning algorithms have the vast amounts of data input required for them to effectively learn patterns and contextual exceptions – essential within the context of the many nuances of the spoken language, from tone of voice to accent.
The generation of voice technology now breaking through into mainstream use was launched with the 2008 release of the first iteration of Google’s Voice Search app. In 2011, Apple acquired Siri, a voice recognition app powered by AI that had launched as an app on the iPhone a year earlier. These early versions of voice-activated assistants were still too stiff and lacking in flexibility to really gain traction and were seen as more of a novelty than something that more smartphone users genuinely made regular use of.
However, being present on smartphones meant that even if use of the software was limited, Google and Apple were able to compile huge voice databases. This data ‘fuel’, in combination with the strides being made in machine learning itself, resulted in a rapid acceleration in the sophistication of voice technology’s sophistication. After Siri, Amazon launched Alexa, which has risen to become the market leading voice-activated assistant and Microsoft Cortana. Big tech is now battling it out for market share of what will be a pervasive and hugely valuable new market for voice technology.
Why Voice Technology Has So Much Potential
Anthropologically speaking, humans are hardwired to communicate through a combination of speech and body language. As naturally as millennials may hammer out text on a smartphone, laptop or tablet, the written language has only been around for a few thousand years and widespread for far less. From an evolutionary point of view, we are speakers, not writers, or typers. The average person can type 40 words a minute but can say 150.
Voice control is also hugely inclusive. Children can operate technology by voice long before they have the dexterity and joined-up-thinking necessary to use a touch screen menu, especially one that involves an ‘options tree’. The same applies to the elderly and infirm.
As soon as the social stigma of talking to a machine fades away, and it very quickly will, interacting with technology through the spoken rather than typed word, or even clicking of a series of physical or digital switches or buttons, will come much more naturally to us. It’s a reality of hundreds of thousands of years of evolutionary biology versus a few generations of writing.
Potential Bottlenecks to Voice Technology Adoption
However, text-based interaction with technology and in the consumption of content also has advantages over spoken input and audio output. Text and visual information input and output offers privacy and also the freedom to consume and create content at our own pace. These qualities cannot be dismissed and are likely to mean voice technology will compliment rather than replace visual and text-based interactions with technology.
Additionally, part of the cycle of new technology reaching mainstream adoption is overcoming resistance that often manifests itself in the form of suspicion. In the case of voice technology this is the suspicion that devices and software may be ‘eavesdropping’. This concern is accentuated by the fact that the newest generations of voice technology can identify individuals through voice memory. It is also often not completely clear what, when and if voice-activated assistants are gathering data when theoretically in ‘sleep’ mode and haven’t been activated by users.
There are numerous reports of owners of voice-activated assistants noticing they are the focus of targeted marketing for products and services they believe could only have been associated as relevant to them based on conversations they have had in the same room as voice technology. This means passively gathering data from conversations taking place within hearing range and prompts privacy and security concerns. What is being recorded, when, where is that data stored and who potentially has access to it?
Makers of voice-activated assistants insist, and the solid evidence suggests this is the case, that the technology only records if activated by their ‘wake’ words. However, suspicion, fuelled by viral stories that proliferate online, remains. While there will always be a minority whose suspicions cannot be assuaged, building trust around privacy is clearly a challenge voice-activated assistants must overcome.
There is also, at least for now ‘incognito’ mode for voice searches which means users do not have the option to choose to exclude requests from their or their device’s digital profile. Voice command logs can, however, subsequently be easily deleted manually. Another search-related weakness of voice is that the lack of visual component means ‘browsing’ for information isn’t a practical option. Browsing is only possible when a screen is incorporated into the mix of a voice search.
Applications of Voice Technology
The above limitations and bottlenecks do not, of course, mean that voice technology is not on the verge of becoming a central pillar of the ways in which we interact with technology. Simply that, as with any technology that involves re-casting social and personal norms, mainstream adoption will be a process rather than the flicking of a switch. Building the software, or ‘skills’, which is the voice technology equivalent of apps, that will increasingly popularise the use of the technology will involve trial and error, especially when it comes to optimising more complex functionalities.
Search is one of the most obvious and wide-ranging applications of voice technology. ‘Search’ can mean simple requests for information such as ‘who won the FA Cup in 1984?’ or ‘which is the nearest tube/metro station to me’? It can also be part of a more complex task such as the user providing contextual information such as details of a legal case and the software finding precedents that match, or possible illnesses being matched to symptoms and exposure to external factors.
Executing Commands is the other obvious core functionality of voice technology. In the domestic environment, along with basic search, this is how voice activated assistants are currently mainly used. Users can ask a voice activated assistant to send a dictated email, call someone, play a particular song, album or playlist or, if connected to smart technology, dim the lights or turn up the heating.
More interesting is the new environments in which voice technology can be expected to become integrated in coming years and to which increasingly complex combinations of the core search and command functionalities will be applied.
Automobiles are one obvious place where voice technology can be expected to become fully integrated. Drivers are limited in what they can do by touch while maintaining safety and voice technology provides a very obvious way to solve that problem. Even in a future where most or at least many journeys do not involve a human driver, the environment of a car, like the home, is naturally suited to voice commands for requests such as activating entertainment options, changing the temperature or even opening a window.
Wearables like Fitbits and smartwatches are also a natural fit with voice technology. Their small screens can be an inconvenient interface and the entire or partial switch to an audio interface would in most cases improve usability. Voice technology will also almost certainly jump start the development of wearables technology. The failure of Google Glass is largely put down to users feeling conspicuous wearing the tech as well as a less than smooth user experience. Both issues would be considerably improved through voice.
Translators are likely to be in trouble as a profession as natural language processing and automated translation becomes more sophisticated and merges with voice technology and wearables. Language has proven one of the trickiest tasks for machine and deep learning AI because of its refusal to consistently adhere to rules-based structures. However, significant progress is now being made with huge data sets allowing translation software to learn language in a contextual way similar to how we do as infants.
Large swathes of the customer service will become dominated by voice technology. We’re already used to going through voice activated menus or providing simple, preliminary information orally during mobile banking or interacting with utility providers. Soon, however, voice technology will mean entire and far more complex customer service processes that resemble natural conversations back and forth will be taken over by voice technology software.
From voice activated assistants in operating theatres to public information systems, the applications of voice technology will be numerous and pervasive. Text and visual interfaces will not disappear but will adapt to their new role in a mix that heavily features voice and audio. This will necessitate a re-imagination of usability design some of the consequences of which are almost certainly not immediately apparent. In the way GPS and Google Maps gave birth to a whole range of new technologies, services and business models such as ride hailing apps like Uber.
The rise of voice technology can also be expected to throw up new and unexpected tangents that will have an as yet unimaginable impact on our everyday and professional lives.
What is certain is that ten years from now voice technology will be a huge influence and will be well into the process of reshaping our relationship with technology.