Kardome unlocks new opportunities for voice assistants with location-based speech clustering
A video demonstration of Kardome's location based speech-clustering technology using Amazon's Alexa in a car to provide personalized results for several passengers.
A video demonstration of personalizing the voice assistant experience using Kardome
The brilliant success of implementing deep learning and artificial intelligence (AI) technology in voice applications has driven the penetration of voice user interface (VUI) and voice assistants into our everyday lives.
Virtual assistants create opportunities for unlocking new kinds of experiences. From kitchens to cars to malls and airports, voice-enabled devices make their way into every imaginable environment.
However, many of these environments’ acoustic complexities create a demand for improved automatic speech recognition (ASR) performance.
Currently, ASRs perform poorly in scenarios where there is auditory competition for the speaker to be heard and understood.
A widespread method to address the auditory competition challenge is beamforming, which steers the microphones towards the speech source’s specific direction.
Unfortunately, indoors or in any closed environment, the sound travels not only through line of sight but also hits every reflective surface in the environment and bounces the sound back to the device.
This phenomenon is typically referred to as reverberation or multipath.
Beamforming-based speech source separation performs poorly in reverberant environments, such as cars, offices, living rooms, or other indoor or closed environments.
Kardome’s technology solves this problem by converting the disadvantage into an advantage. That is, Kardome uses the multipath pattern to discriminate between the speech sources. We achieved this approach by listening to a single direction and all directions in three dimensions.
The following video compares Amazon’s virtual assistant Alexa in a car with Kardome’s location-based speech clustering. We implemented the system on an ARM Cortex A7, which uses about 20% of the available CPU power.
We show how a location-based speech separation approach can provide a personalized user experience for multiple passengers in a car by making Alexa react separately to each person using personal accounts, preferences, and history.
We base the demo on Kardome’s algorithms for localization, source separation, and noise reduction.
Two passengers ask Alexa different questions. Kardome attributes each request to the specific user, and Alexa reacts accordingly.
For this specific demo, we configured Alexa to respond in Spanish to inquiries from the rear passenger, while the queries from the driver have Alexa respond in English. It uses language personalization as an example of Kardome’s unique application.
Watch the video: