“Personalized Hey Siri”


Apple Machine Learning Journal:

In addition to the speaker vectors, we also store on the phone the “Hey Siri” portion of their corresponding utterance waveforms. When improved transforms are deployed via an over-the-air update, each user profile can then be rebuilt using the stored audio.

The most Apple-like way to continuously improve that I can think of. More interesting, though, is this bit later on:

The network is trained using the speech vector as an input and the corresponding 1-hot vector for each speaker as a target.

To date, ‘personalized Hey Siri’ has meant “the system is trained to recognize only one voice.” That quote, though, sounds like they’re working on multiple-user support; which, with the HomePod, they really should be.

Comments?

This site uses Akismet to reduce spam. Learn how your comment data is processed.