Apple's Greg Joswiak: Siri Wasn't Engineered to Be Trivial Pursuit
In iOS 11, Apple's AI-based personal assistant Siri has a much more natural voice that goes a long way towards making Siri sound human like. Siri speaks with a faster, smoother cadence with elongated syllables and pitch variation, a noticeable departure from the more machine like sound in iOS 10.
The team behind Siri, including Siri senior director Alex Acero, has worked for years to improve the way Siri speaks, according to a new interview Acero did alongside Apple VP of marketing Greg Joswiak with Wired. While Siri's voice recognition capabilities were powered by a third-party company early on in Siri's life, Acero's team took over Siri development a few years back, leading to several improvements to the personal assistant since then.
Siri is powered by deep learning and AI, technology that has much improved her speech recognition capabilities. According to Wired, Siri's raw voice recognition capabilities are now able to correctly identify 95 percent of users' speech, on par with rivals like Alexa and Cortana.
Apple is still working to overcome negative perceptions about Siri, and blames many of the early issues on the aforementioned third-party partnership.
"It was like running a race and, you know, somebody else was holding us back," says Greg Joswiak, Apple's VP of product marketing. Joswiak says Apple always had big plans for Siri, "this idea of an assistant you could talk to on your phone, and have it do these things for you in a more easy way," but the tech just wasn't good enough. "You know, garbage in, garbage out," he says.
Joswiak says Apple's aim from the beginning has been to make Siri a "get-s**t-done" machine. "We didn't engineer this thing to be Trivial Pursuit!" he told Wired. Apple wants Siri to serve as an automated friend that can help people do more.
One unique Siri attribute is its ability to work in multiple languages. Siri supports English, French, Dutch, Mandarin, Cantonese, Finnish, Hebrew, Malay, Arabic, Italian, and Spanish, and more, including dialect variants (like English in the UK and Australia) and accents. The Siri team combines pre-existing databases of local speech with local voice talent and on-device dictation, transcribing and dissecting the content to find all of the individual sounds in a given language and all of the ways those sounds are pronounced.
In areas where Apple offers spoken dictation but no Siri support, it's gathering data for future Siri support, and in places where Siri is already available, spoken interactions between user and device (gathered anonymously) are used to improve algorithms and train the company's neural network.
Creating the right voice for Siri in a given language hinges on the proper voice talent, and Apple uses an "epic search" with hundreds of people to find someone who sounds helpful, friendly, spunky, and happy without overdoing it. Once the right person is found, Apple records them for weeks at a time to create the right sound. So far, Apple has repeated this process for all 21 languages Siri supports.
Ultimately, Acero and his Siri team are aiming to make Siri sound more like a trusted person than a robot, creating an attachment to the AI that will "make Siri great" even when Siri fails to answer a query properly. Apple also wants to make people more aware of what Siri can and can't do and that it exists in the first place, which is why iOS 11 includes Siri-centric features like cross-device syncing and a better understanding of user interests and preferences.
Wired's full piece, which goes into much more detail on how Siri recognizes various aspects of speech and how Apple chooses voice talent can be read over on the site.