One of Google’s most well-known features is its voice commands, which let you use only your voice to search the web and perform actions on your smartphone. If you’re one of the people that regularly uses your voice to control your phone, you’ll be glad to learn that Google is making its voice feature better.
Google says that it’s made new neural network acoustic models using Connectionist Temporal Classification (CTC) as well as sequence discriminative training techniques. These are the latest models being used for Google’s voice search, following Gaussian Mixture Model (GMM) and Deep Neural Networks (DNN).
Google’s new tech utilizes Recurrent Neural Networks (RNN). In a new blog post, Google explains how it uses RNN tech:
“Our improved acoustic models rely on Recurrent Neural Networks (RNN). RNNs have feedback loops in their topology, allowing them to model temporal dependencies: when the user speaks /u/ in the previous example, their articulatory apparatus is coming from a /j/ sound and from an /m/ sound before. Try saying it out loud - “museum” - it flows very naturally in one breath, and RNNs can capture that. The type of RNN used here is a Long Short-Term Memory (LSTM) RNN which, through memory cells and a sophisticated gating mechanism, memorizes information better than other RNNs. Adopting such models already improved the quality of our recognizer significantly.”
So what does this all mean for us? Google explains that voice searches and commands in the Google app will now use fewer computational resources and also be more accurate and faster to respond. That goes for the Google apps on both Android and iOS. Dictation on Android devices will benefit from today’s change as well.