Siri, Kinect and other input methods

Jan 16, 2012

Consumer electronics companies seem hellbent on replacing the remote control, the mouse and the keyboard. Touch screens, speech, motion detection and other methods of inferring a user's desires are all the rage. Too bad most of them make life harder.

I get it that a mouse or a remote control is an artifice between the human and the device. They're not ideal, but any alternative must be better for it to be useful. And to be better, it must be easier, faster and more accurate. So, let's take a look:

Touch screens came into mainstream popularity with the iPhone. Although they existed before the iPhone's debut, they were still mostly a novelty. You might use one at an airport kiosk or an ATM, but that was likely it.

The benefit of a touch screen, of course, is that the screen can change according to the situation, dynamically presenting the user with the appropriate interface.

As a result, touch screens are generally considered easier. Speed and accuracy, though, are not a touch screen's strengths. By feeling physical keys, you get tactile feedback lacking in a touch screen. Such feedback is important for speed and accuracy.

Regardless of one's stance on the benefits of a touch screen, you still had to touch it in order to make it work. So, the smart people in consumer electronics decided to do away with such requirements and developed motion detection. Cameras can watch and monitor your movements, so why not interpret those movements to control a device? Microsoft's XBox Kinect is the mainstream example of this technology. I have one and it works amazingly well.

But, it's not perfect. Nowhere close, in fact. It's great for the gross motion — that is, jumping, raising your hand, moving from side to side — that you find in its games. But, tiny, little motions are far less reliable. And while I'm sure computer power and camera resolution will improve those traits, there are two other issues that are harder to dismiss.

The first is motion interference. I have two dogs. Big dogs. When they walk between me and the Kinect, the Kinect gets confused. It's not just dogs, of course, but people, too. Point is, it can't discern context and that's a problem. It means constantly having to control the environment for the device to work perfectly. And that includes having the proper space for it to see and capture your movements.

All of that is hugely problematic. For a control system to make my life easier, it must work within my environment. If I have to adjust to its demands, then it's a failure.

The second issue is that, for casual interactions, waving to a sensor is a lot harder than just clicking a button a remote.

These aren't problems for what the Kinect was designed to do: play games. But, it is a huge problem if the Kinect is used for other purposes, such as controlling a TV. With a remote, I can blindly click a button. With a sensor, I have to activate it and start waving my hands around. It's cool the first time. It's annoying and obnoxious — and most importantly, slow — after that.

So, is motion detection easier? For some things, yes. For others, not at all. Faster? More accurate? I don't think so.

That leads me to voice recognition. Although voice recognition has been around for a while, it has historically required people to "train" the computer or software to recognize their voice. Likewise, it has usually required the use of a headset to capture the voice clearly and cleanly.

This, too, has changed. Android phones and, most notably, the iPhone have touted voice recognition as the interface of the future. Even the Kinect now understands voice commands.

And the theory is that a new Apple TV will adopt Siri as its controller.

I sure hope not. Voice recognition may be cool in concept, but it is terrible in practice. Again, consider the environment. One must control for room noise and other people talking. It has to hear you, which likely means talking loudly at the TV. Although this might be something the elderly are already familiar with when it comes to the news, it is an annoying requirement at best. And, it has to understand what you're saying. That includes both understanding what words you're saying and understanding what your words mean.

My experience with Siri is that this is still a major problem. I don't have a desire to learn Siri's commands, and Siri isn't smart enough to know what I mean. In the end, it takes more work to tell Siri what I want then to just use the keyboard myself. (The lone exception to that is dictation, where it simply has to capture my words, rather than my meaning.)

Another issue is room noise. Can a voice-activated TV even hear what a person is saying over the TV itself, or other noises, such as a washing machine or other people in conversation?

The best case scenario is that the voice recognition would be too good. This problem was famously spoofed on 30 Rock:

http://o.aolcdn.com/videoplayer/AOL_PlayerLoader.swf

So is voice recognition easier? Maybe when you're otherwise occupied, such as driving. Most of the time, though, I find it frustrating, slower and less accurate.

And that leads me to Apple's possible TV. I think it would be a mistake to make Siri its key/primary feature. Instead, I'd like to see Apple continue to focus on access to content. But, that's another post.

Hatchomatic

Discussion about this post