Apple Says 'Hey Siri' Detection Briefly Becomes Extra Sensitive If Your First Try Doesn't Work
A new entry in Apple's Machine Learning Journal provides a closer look at how hardware, software, and internet services work together to power the hands-free "Hey Siri" feature on the latest iPhone and iPad Pro models.
Specifically, a very small speech recognizer built into the embedded motion coprocessor runs all the time and listens for "Hey Siri." When just those two words are detected, Siri parses any subsequent speech as a command or query.
The detector uses a Deep Neural Network to convert the acoustic pattern of a user's voice into a probability distribution. It then uses a temporal integration process to compute a confidence score that the phrase uttered was "Hey Siri."
If the score is high enough, Siri wakes up and proceeds to complete the command or answer the query automatically.
If the score exceeds Apple's lower threshold but not the upper threshold, however, the device enters a more sensitive state for a few seconds, so that Siri is much more likely to be invoked if the user repeats the phrase—even without more effort.
"This second-chance mechanism improves the usability of the system significantly, without increasing the false alarm rate too much because it is only in this extra-sensitive state for a short time," said Apple.
To reduce false triggers from strangers, Apple invites users to complete a short enrollment session in which they say five phrases that each begin with "Hey Siri." The examples are saved on the device.
We compare the distances to the reference patterns created during enrollment with another threshold to decide whether the sound that triggered the detector is likely to be "Hey Siri" spoken by the enrolled user.
This process not only reduces the probability that "Hey Siri" spoken by another person will trigger the iPhone, but also reduces the rate at which other, similar-sounding phrases trigger Siri.
Apple also says it created "Hey Siri" recordings both close and far in various environments, such as in the kitchen, car, bedroom, and restaurant, based on native speakers of many languages around the world.
For many more technical details about how "Hey Siri" works, be sure to read Apple's full article on its Machine Learning Journal.
Top Rated Comments
My biggest problem was described beautifully in a recent article I read somewhere. About how a voice assistant with 10 possible working commands is great. And one with unlimited working commands is great. But one with hundreds of working commands is terrible, because the user will never know what all those commands are. They will just use the core few that they know. And if they try a command and it isn't one of those hundreds, it causes confusion and doubt.
Except her ability to do stuff.
Apple would help Siri’s reputation a lot, by keeping an active Wiki going for the service, and what commands it will respond to. Because many people have tried to use Siri once for a specific task, found it didn’t work, and since given up. It is really hard for a general user to discover new tricks.
Places like iMore do a decent job of documenting, but, it would be AWESOME if the source had a good manual for it..
Me: "How many files have I opened today?"
Siri: "on the internet I found 'how big is Allah.'"
maybe next year