CES 2012: The disruptive power of gesture and voice recognition

By Jon HealeyEditorial Writer

Jan. 13, 2012 4:26 PM PT

This article was originally on a blog post platform and may be missing photos, graphics or links. See About archive blog posts.

At a panel discussion at the Consumer Electronics Show this week, Mike Masnick of TechDirt noted that we typically don’t recognize disruptive technologies until after the fact. He’s probably right, but sometimes you really can see a technology rocking an industry in real time.

That’s the case today with gesture and voice recognition. These aren’t new technologies, but judging from CES, they are finally poised to metastasize. Microsoft’s Kinect motion sensors, of which more than 18 million have been sold, have prompted industries far removed from video games to rethink how people will use their products and services. Similarly, Apple’s Siri virtual assistant has taught manufacturers and software developers that voice recognition has moved beyond recognition and into comprehension.

Together, these developments reflect an accelerating shift from mechanical interfaces to natural ones -- from typing on a keypad or thumbing a remote to pointing, asking and telling. And that’s happening largely as a consequence of the rapid increase in microchip processing power, said Aviad Maizels, founder and president of PrimeSense, which designed the Kinect’s chips.

“We didn’t have a technology when we started. We had an idea,” Maizels said. It took a while for chips to have enough horsepower to perform the near-instantaneous analysis of moving images that even basic gesture recognition requires. They’ve since crossed that threshold, and continued improvements in processing power are enabling more sophisticated gesture recognition tools.

“Moore’s Law works for us,” said Adi Berenson, PrimeSense’s vice president of business development and marketing.

The increase in processing power has also helped improve speech-recognition software, said Richard H. Mack Jr. of Nuance Communications, which makes some of the technology behind Siri. Another factor, he said, has been assembling the vast amount of data needed to understand what the speech means, and then respond accordingly. Of course, the ability to analyze all that data is also a function of processing power. “It’s only going to get better,” Mack said.

TV remotes offer a good illustration of the improvement thus far. Wands that could recognize a spoken command -- say, “Channel 2” or “power on” -- have been around for more than a decade. The next wave, represented by Nuance’s Dragon TV and the forthcoming Vlingo TV app, will help people search through program guides, answer questions about shows and exchange messages with friends while they watch TV.

Berenson showed off PrimeSense’s next-generation product, which can recognize movement in three dimensions -- not just up and down and side to side, but forward and back. A prototype program guide let Berenson pick out a movie from an on-screen list by reaching toward it, making a grabbing motion to start an audio preview, then pulling back to start the video. The sensor was notably more responsive to subtle movements than the current products are.

PrimeSense, one of several companies developing the enabling technology for gesture recognition, is keeping its focus on the living room. The idea has already caught on with several major TV manufacturers, which showed gesture-sensing sets at the show. Across the exhibition halls at CES, though, many other applications of gesture recognition were on display, particularly in healthcare. To cite just two examples, the ng Connect booth included prototypes of cloud-based fitness and physical therapy services built on Kinect sensors. And at the PrimeSense booth, Bodymetrics (pictured above) showed how it’s using the technology in high-tech dressing rooms that scan shoppers’ bodies, then let them try on clothes virtually to check their fit.

PrimeSense and Nuance are encouraging the spread of gesture and voice recognition by helping developers apply the technologies to new uses. Nuance has about 7,000 developers using its tools, Mack said, and there are more than 3,000 developers in OpenNI, a PrimeSense initiative to promote interoperability among “natural interaction” software and devices.

Maizels gives Microsoft credit for the snowballing momentum behind natural interaction. “Microsoft did a tremendous job of telling the world that something has to change,” he said. Judging from CES, the world listened.

Sesame Street Kinect shows the promise of TV voice, gesture control

Kinect-enabled dressing room lets you change clothes without having to take clothes off

-- Jon Healey

Healey writes editorials for The Times’ Opinion Manufacturing Division. Follow him at @jcahealey.

CES 2012: The disruptive power of gesture and voice recognition

UCLA cancels classes after counterprotesters violently attack pro-Palestinian camp

Feds say he masterminded an epic California water heist. Some farmers say he’s their Robin Hood

Law that ended single-family zoning is struck down for five Southern California cities

After scandal, movie producer Randall Emmett is flying under the radar with a new name

Goldberg: What we keep getting wrong about protests like those at USC, Columbia and other campuses