nScreenMedia OTT multiscreen media analysis

What voice technology is for and where it’s going

Speech and AI panel at NAB 2019

What is the goal of voice technology in video service today and how close to achieving it will be in 2021? Here’s what three experts on the subject had to say.

At NAB 2019 in Las Vegas, I moderated a panel discussion entitled Voice Control and AI: Pushing the TV Experience Forward. Experts from IBM, Comcast, and Gracenote joined me in the debate. I used two key questions to start and end the panel. I first asked a deceptively simple question: what is the objective of the technology? Then, I asked the panelist how far along on the road toward that goal we would be by NAB 2021? Here’s what each had to say.

Flatten the UI, replace the remote

Amit Bagga ComcastAmit Bagga, Vice President, Research and Development, Comcast runs the AI and Machine Learning Center for Excellence at Comcast and runs the OS remote team. Comcast has made much progress with voice search and control over the last five years. He says that the company has 20 million voice remotes in use by Xfinity Video subscribers and licensors of the platform. Those voice-enabled customers issued 9 billion commands in 2018, which works out to more than one request per remote per day.

Mr. Bagga says one of the primary tasks of voice services is to save the customer from having to navigate the guide to find what they want:

“Our goal was to flatten the UI, get the users to the content with the least number of clicks possible. “

He sees voice technology continuing to evolve quickly and ultimately supplanting the remote altogether:

“I think we will have hands-free control of the TV. You will not have to pick up the remote if you chose not to.”

Simplify and make it a conversation

Simon Adams - GracenoteSimon Adams, Chief Product Officer of Gracenote, agrees with Mr. Bagga that simplifying the process of finding something to watch is a crucial goal:

“To make the user experience much simpler and easier to find stuff they want to watch and listen to.”

He took the idea of simplification much farther when thinking about the next two years.

“I don’t think we’ll be in a remoteless world, but I think we’ll be in a much better conversational place.”

Voice recognition and personalization of a reply has become much more accurate over the last several years. However, the ability to refine and modify results in a conversational manner remains a challenge. Mr. Adams believes the technology will make much progress in this area by 2021.

Understand intent, make the technology more accessible

Peter Guglielmino IBMPeter Gugliemino is Chief Technology Officer at IBM and focuses on media and entertainment products. He works closely with the Watson Media team. He sees the AI technology being used to not only understand what is said but also the mood and intent of the speaker:

“The goal of speech technology, in general, is to get better insight into what they <the users> are looking to do.”

For example, Watson is being used in call centers to understand the emotional state of a caller and to route the call appropriately based on that information.

Looking ahead, Mr. Gugliemino sees voice technology becoming more broadly available and working in speech-unfriendly environments:

“In two years, we’ll have more accessibility and more languages. One of the key problems that we’re facing is that everything is in English. I also think that the signal processing that is involved in understanding overlapping speakers and separating speech from background noise will be much improved.”

Indeed, the adoption of speech technology for media search and control is much lower in non-English-speaking countries. As well, users will be pleased with the improved ability of voice recognition systems to separate a specific voice from background noise.

Why it matters

Simplifying the business of finding something to watch is a crucial goal of voice technology in media.

Soon the technology will:

  • Be capable of carrying on a conversation to narrow search results
  • Replace the remote completely – if that’s what a user wants to do

Work with more languages and be more robust in hostile environments.

Speech and AI panel at NAB 2019 2


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.