CES 2012: The disruptive power of gesture and voice recognition
At a panel discussion at the Consumer Electronics Show this week, Mike Masnick of TechDirt noted that we typically don't recognize disruptive technologies until after the fact. He's probably right, but sometimes you really can see a technology rocking an industry in real time.
That's the case today with gesture and voice recognition. These aren't new technologies, but judging from CES, they are finally poised to metastasize. Microsoft's Kinect motion sensors, of which more than 18 million have been sold, have prompted industries far removed from video games to rethink how people will use their products and services. Similarly, Apple's Siri virtual assistant has taught manufacturers and software developers that voice recognition has moved beyond recognition and into comprehension.
Together, these developments reflect an accelerating shift from mechanical interfaces to natural ones -- from typing on a keypad or thumbing a remote to pointing, asking and telling. And that's happening largely as a consequence of the rapid increase in microchip processing power, said Aviad Maizels, founder and president of PrimeSense, which designed the Kinect's chips.
"We didn't have a technology when we started. We had an idea," Maizels said. It took a while for chips to have enough horsepower to perform the near-instantaneous analysis of moving images that even basic gesture recognition requires. They've since crossed that threshold, and continued improvements in processing power are enabling more sophisticated gesture recognition tools.
"Moore's Law works for us," said Adi Berenson, PrimeSense's vice president of business development and marketing.
TV remotes offer a good illustration of the improvement thus far. Wands that could recognize a spoken command -- say, "Channel 2" or "power on" -- have been around for more than a decade. The next wave, represented by Nuance's Dragon TV and the forthcoming Vlingo TV app, will help people search through program guides, answer questions about shows and exchange messages with friends while they watch TV.
Berenson showed off PrimeSense's next-generation product, which can recognize movement in three dimensions -- not just up and down and side to side, but forward and back. A prototype program guide let Berenson pick out a movie from an on-screen list by reaching toward it, making a grabbing motion to start an audio preview, then pulling back to start the video. The sensor was notably more responsive to subtle movements than the current products are.
PrimeSense, one of several companies developing the enabling technology for gesture recognition, is keeping its focus on the living room. The idea has already caught on with several major TV manufacturers, which showed gesture-sensing sets at the show. Across the exhibition halls at CES, though, many other applications of gesture recognition were on display, particularly in healthcare. To cite just two examples, the ng Connect booth included prototypes of cloud-based fitness and physical therapy services built on Kinect sensors. And at the PrimeSense booth, Bodymetrics (pictured above) showed how it's using the technology in high-tech dressing rooms that scan shoppers' bodies, then let them try on clothes virtually to check their fit.
PrimeSense and Nuance are encouraging the spread of gesture and voice recognition by helping developers apply the technologies to new uses. Nuance has about 7,000 developers using its tools, Mack said, and there are more than 3,000 developers in OpenNI, a PrimeSense initiative to promote interoperability among "natural interaction" software and devices.
Maizels gives Microsoft credit for the snowballing momentum behind natural interaction. "Microsoft did a tremendous job of telling the world that something has to change," he said. Judging from CES, the world listened.
-- Jon Healey
Photo credit: Jon Healey / Los Angeles Times