The rapidly growing popularity of voice-controlled smart TVs and set-top boxes drives future growth. However, poor speech recognition performance may inhibit further adoption if manufacturers and OEMs don’t heavily invest in superior VUI technology.

The Problems with Voice Controlled Smart TVs

The rapidly growing popularity of voice-controlled smart TVs and set-top boxes drives future growth. However, poor speech recognition performance may inhibit further adoption if manufacturers and OEMs don’t heavily invest in superior VUI technology.

Laura Tate
Laura Tate
CMO
Tech

Table of Contents

The rapidly growing popularity of voice-controlled smart TVs and set-top boxes drives future growth. However, poor speech recognition performance may inhibit further adoption if manufacturers and OEMs don’t heavily invest in superior VUI technology.

While the use of voice assistants in smartphones and speakers has increased in recent years, the pandemic has spurred consumers' broader adoption of voice-controlled smart TVs. More than half of Americans reported that voice control is essential to have in smart devices, including home appliances and TVs/remotes, according to a survey by Syntiant

According to Juniper Research, spending via smart TV voice control will reach $500 million by 2024. The growth of voice assistant use in smart TVs reflects how these devices are becoming context-dependent sales portals and information and control systems. 

Another area where voice control has made significant gains is in set-top boxes (STBs). While they may seem outdated, many people still prefer these devices to stream broadcasts to their analog TVs. Research shows that global demand for STBs will reach 341.1 million units by 2027.

However, there are still some challenges facing manufacturers and OEMs of voice-controlled devices, including compatibility issues between apps, poor voice recognition due to background noise and reverberation, and confusion between multiple users in one room.

Problems with Smart TV Voice Recognition

An automatic speech recognition (ASR) engine or speech recognition system is a computer that understands what you say and transcribes it into text. Voice recognition personalizes the experience by identifying who is talking. Voice recognition is a massive development for consumer electronics, but the technology is still very much in its infancy in many respects.

Smart TVs are still a relatively new product in the consumer electronics market and, as such, have several problems that make them not as user-friendly as possible. However, manufacturers and OEMs are seeing these issues and are attempting to fix them with future product models.

To build a beneficial smart TV voice recognition system, TV manufacturers need to think about how they want to use voice recognition in the first place and what problems they're trying to solve. And they need to start developing these solutions with the user experience (UX) in mind.

"ASR systems frequently produce incorrect transcriptions and corrupt the original queries, forcing users to reformulate their requests or surrender entirely."

Voice control will become ubiquitous over time, so it's important that users feel comfortable using it while at home, regardless of where they are in their journey toward becoming a voice-activated interface power user.

A smart TV should be intuitive and conversational, reducing the friction between the moment a person sees something of interest on-screen and the moment they can interact with it by speaking into their remote or the TV.

Additionally, during the moment when a person gives a voice command, they shouldn't have to worry about whether it's quiet enough or the only person talking. The TVs ASR system needs to hear who is talking and what they're saying, even if they're in a crowded room full of chatter and laughter.

Despite their extraordinary potential, voice-controlled TVs aren't quite there yet.

The Impact of Poor Speech Recognition on the Set-Top Box Experience

The poor speech recognition problem is also a challenge for set-top box manufacturers. A set-top box enables a TV set to receive and decode broadcasts from digital television (DTV). To receive digital broadcasts, viewers with analog television sets must use an STB, such as Roku, Broadcom, or Pace.

For many people, their set-top box is their primary method for interacting with their television. With the addition of new features like streaming video and apps, users are spending more time interacting with these set-top boxes than ever before, which means they are also experiencing more frustration than ever with poor speech recognition.

In addition to the usual problems that face speech recognition in smart devices, such as background noise and interfering signals, reverb also negatively impacts STBs. TV viewers tuck set-top boxes out of sight— in a corner, cabinet, or behind the TV. As a result, when a person issues voice commands, reverberations—the repeated echo of the signal—corrupt the transmission of the signal to the box. 

What is the impact of poor speech recognition?

As the authors of the study, "Yelling at Your TV: An Analysis of Speech Recognition Errors and Subsequent User Behavior on Entertainment Systems," found, "ASR systems frequently produce incorrect transcriptions and corrupt the original queries, forcing users to reformulate their requests or surrender entirely."

Solutions for Clear Speech Recognition in Smart TVs

Voice user interface technology that reduces background noise, eliminates reverberation, and correctly identifies a person speaking is needed to solve the significant problems of accurate speech recognition in smart TVs.

Additionally, the ideal voice recognition solution in Smart TVs will eliminate a multi-step process, such as asking their smart speaker, "Hey Google, turn on my Samsung TV."

While some TV manufacturers have partnered with Apple, Google, and similar technology companies, these major voice companies still have not solved poor speech recognition results. 

Next-Generation Voice Control for Smart TVs

Voice control is going to change the way people use their smart TVs. Of course, it’s already possible to use voice control with some TVs, but we're talking about a generation of devices that will understand what you say and respond in a meaningful way.

Innovative TV manufacturers should keep this in mind to build the next generation of interfaces for their devices. For example, future smart TVs will feature seamless speech recognition systems that eliminate using a separate remote control.


Find out how Kardome can improve your existing TVs and Set-top boxes’ voice recognition capabilities or integrate our VUI technology from the ground up.

Book a Demo Today 


Get Started Today

Give Your Users
A Voice

Kardome’s VUI technology can integrate with any voice-enabled platform or smart device.

Let's Talk

* indicates required
Email
What challenges are your facing?
What solution are you seeking?
NEWSLETTER

Multi-speaker Isolation

Eliminate Background Noise

Accurate Speech Recognition