Speech Recognition in Cars - An Experimental VUI Study
Product Updates

Speech Recognition in Cars - An Experimental VUI Study

Kardome’s technology enables VUIs to work in actual life, where there are multiple speakers and ambient sounds.

Dr. Dani Cherkassky
Dr. Dani Cherkassky
CEO, Co-founder

Table of Contents

A recent study by Voicebot.AI revealed that about 60% of consumers say voice assistants are a factor in their new car purchase criteria. Over 20% of all consumers say that the in-car voice assistant experience is a “significant consideration” or a “requirement.”

This same research also revealed that about 50% of voice interface users in cars believe there was no significant improvement in voice user interface (VUI) performance during the last two years. 

Today, most people still consider VUIs in cars as a gadget rather than a robust interface that can replace traditional touch screens and buttons. While speech recognition engines have improved in the last decade, VUI performance in cars continues to suffer from limited reliability. Interfering speech signals and driving noises challenge state-of-the-art speech recognition engines.

Fig 1: Voice Control Manual

Today, automakers supply a manual for VUI systems in cars, similar to BMW’s. These manuals put the responsibility on the user to create a suitable environment for the VUI to work.

Typical instructions for the user are: “Please avoid background noise,” “Please ask your passengers not to speak while voice command is issued,” and similar instructions. Understandably, users find themselves frustrated by not being understood by machines, which reduces trust and engagement. 

This is where Kardome steps in. Kardome’s technology enables VUIs to work in actual life, where there are multiple speakers and ambient sounds. Kardome’s software makes voice technology work for people in noisy cars by allowing multiple users to communicate with their devices simultaneously without interference from their fellow passengers. Kardome ensures optimal road safety and a superior driver/passenger VUI experience. 

This article shares the results of an experimental study carried out by HEAD acoustics GmbH, a leader in acoustic solutions and sound vibration analysis. The study’s goal is to compare the speech recognition rate (SRR) obtained by the Google Speech to Text (GSST) engine in a car traveling at 120 kph with two types of speech processing systems: standard Hands-Free Telephony (HFT) audio stack, and  Kardome's AI-driven signal separation and noise reduction technology packed in its Gavel Evaluation Kit

The results show that Kardome’s technology can improve speech recognition for voice interaction devices in cars in any type of sound environment.

SRR Evaluation Setup 

Kardome’s Gavel Evaluation Kit was installed in the overhead compartment of a Renault Megane Grandtour, next to the standard HFT system’s microphones. We installed four Artificial Head Measurement Systems in the car to deliver speech through a full-band-capable artificial mouth.

Fig 2: Evaluation setup with four Artificial Head Measurement Systems and Kardome’s Gavel Evaluation Kit.

We considered three scenarios:

  1. Sole speech by the driver
  2. Two simultaneous speakers, driver and co-driver
  3. Four simultaneous speakers

In all the scenarios, the existing in-car HFT system and Kardome’s Gavel Evaluation Kit recorded the speech signals. HEAD acoustics’ engineers tested the speech recognition rate by comparing the GSTT output text to the driver’s actual speech.

The Results

Fig. 3 shows the SRR performance for each of the considered scenarios. Each bin in the plot represents the SRR obtained by each of the two signal processing methods (HFT and Kardome) in each of the three strategies. The green bins correspond to the HFT and the blue bins to Kardome’s speech clustering system.

Speech to text performance in car
Fig 3: GSTT Speech Recognition Rates when using Kardome and standard HFT systems in the car.

In Summary

Kardome's AI-driven signal separation and noise reduction technology significantly improved the SRR in all the considered scenarios. Interfering speech signals significantly degraded the hands-free telephony system results as the number of speakers increased. Whereas, Kardome received a consistent SRR performance despite the number of speakers and interfering signals.

Improving voice recognition technology in cars using VUI design such as Kardome’s will help voice assistant manufacturers overcome inefficiencies in speech recognition. It will also help automakers compete effectively in what will soon become a crowded marketplace for smart car voice assistant features. 

Send us a message to learn more about Kardome.

Enjoyed this read?

Stay up to date with the latest video business news, strategies, and insights sent straight to your inbox!

Get Started Today

Give Your Users
A Voice

Kardome’s VUI technology can integrate with any voice-enabled platform or smart device.

Multi-speaker Isolation

Eliminate Background Noise

Accurate Speech Recognition