In 1977 Luke Skywalker had a conversation with C3PO and R2D2, and society has been obsessed with interacting with our machines ever since. Our fascination with having real-time and meaningful vocal interactions with our machines goes back through the decades. And we are close… very close, to make movie magic into real life thanks to mobile apps and the newest voice recognition softwares. Advances in voice recognition technology has come in leaps and bounds in the last fifteen years. Companies such as Nuance, Google, Apple and Microsoft have brought the technology from special applications to your desktop and smartphones.
The players of today
Before Google Voice Search and Siri, there was Dragon Naturally Speaking. First released by Nuance Communications in 1997, “Dragon” is the world’s best-selling speech recognition system, currently available in English and 6 other languages including Dutch and Japanese.
With Dragon, Nuance brought an efficient, effective and user-friendly speech recognition system to the masses. With the dawn of the smartphone revolution, Nuance brought their award winning software to the table with Dragon Mobile Assistant.
Google Voice Search evolved into Google Now which has been touted as the most efficient and user-friendly speech recognition tool. However, it was Apple’s Siri (who can now recognize 22 languages) that changed the way the public saw voice-recognition software. In a way, Apple made more than a voice-recognition tool, it created an entity…an entity that can speak and understand 22 languages.
Microsoft was late to the dance, unveiling Cortana in the summer of 2015, but it did build on top of the headway and technologies of its predecessors. Microsoft’s biggest move is making Cortana available across multiple platforms which is sure to accelerate adoption.
The missing link
The future of voice-recognition is an exciting one, but is inexorably tied to other fields. Take Natural Language Understanding (NLU) for example. It is widely considered to be the missing link when it comes to progressing the field of voice-recognition. That's why mobile app development companies must keep an eye in this technology.
NLU is the key to the natural interactions that have been so elusive all these years. It allows the machine to discern the nuances, accents and inflections that often trick the software currently available.
However, this requires the software to be ever present to gather the data it needs and it requires the ability to learn progressively. In short, what we need is the integration of speech-recognition coding in more and more devices, we need people to interact with them more and we need these devices to have a working Artificial Intelligence (AI) to learn.
The role of Cloud based APIs
The spread of speech-recognition technology is due in part to the arrival MindMeld and other cloud based API’s. This allows mobile app development companies to play and manipulate the code necessary to build an intelligent voice interface for mobile apps and sites even though they themselves do not hold advanced degrees in natural language processing or neuroscience. This democratization of the development process is accelerating the growth of the entire voice-recognition field.
Unsurprisingly, this will and has given rise to a number of competitors to the reigning leaders in the space (Nuance, Google, Apple and Microsoft), but this should not be seen as a bad thing.
Competition often breeds innovation and the space can use more of it at this point. Any and all movement of the needle should be welcomed. All the mobile app development companies should be able to innovate and participate in this transformation.
Wearables and mobile apps
Who can forget the moment when Dick Tracy or Michael Knight brought their watches to their mouths and started speaking into it? Sure it was just movies and TV shows but it hit a nerve with the people, and it showed that we not only like technology, but have a desire to be connected all the time. In the last 5 years, products like Google Glass, Apple Watch and Samsung Gear have pushed the wearables market into the mainstream. In the autumn of 2015 Apple applied for a patent for what they described as a “ring computing device”.
All these devices will, at one point, be forced to highlight the importance and need for voice activated commands in the mobile apps, the user experience (UX) demands it. Google Glass learned this lesson the hard way. It had great functional potential but it did not foresee the high social costs associated with wearing it. It looked bad, felt bad, it was expensive and it made others feel uncomfortable.
In short, it was the technological marvel that no one wanted.
Voice commands have the ability to address UX concerns and make everything easier with a simple mobile app. It won’t eliminate other forms of interaction but it will be the easiest.
For watches or products that have small screens it allows mobile applications to interact without obscuring the screen. The technology is on some level currently available but adoption rates are still low, however this is set to increase as voice and gesture commands are more elegantly combined.
However, the industry does admit that this seamless integration is highly dependent on the development and evolution of improved voice control software technologies.
If Henry Ford could only see how much technology was in today’s cars his head would explode. He wouldn’t even recognize it asides from the four tires and steering wheel.
More and more of today’s automobiles are being integrated with voice command technology. This move is reinforced by mounting social pressures and government regulations that are forcing drivers to keep their hands on the wheel.
Advances in voice-recognition technology will not only allow greater use of mobile device and mobile app functions but also in the functions of the car itself (examples of this include changing ambient temperature, open and close windows, change music, etc). People have already become comfortable with their mobile devices, they don’t want to change and now they want a seamless interface across all devices which includes their car.
Google’s self-driving car and Tesla’s range of vehicles are already with much of this technology.
They have a limited ability to learn and respond. They can drive themselves but your preferences and behaviors can also be programmed into them, giving you an unparalleled driving experience.
The future of voice-recognition and the development of artificial intelligence are intertwined.
The days of conversing with JARVIS a la Tony Stark or Picard and the Enterprise’s Computer are still in the distant future, but we are on our way.
Deep Learning, a tool used to create systems with a high accuracy for tasks like speech recognition and language analysis is essential to propel the current technology forward. But there are a couple hurdles that need to be overcome first. The first is that leaders in the space, companies like DeepMind and Vicarious, need to make their platforms available to customers, and get the ball rolling. The second is that more APIs that rely on Deep Learning need to be offered. This leads to more rapid development and growth of the technology.
No need to worry though, Skynet is still decades away. But to even have a shot at that level of AI, the idea of a generalized intelligence should be courted.
The notion of a “connected home” can be a starting point. Imagine a home that is always watching, listening and learning. A home that you can not only command but can have real-time and rich interactions with. It won’t just shutter the windows at your command, but will ask you if you’re in the mood for steak or a hot shower. By having a constant stream of audio inputs, the home will be able to recognize and respond to your vocal queues regardless if you have an accent or a sore throat.