Monday, December 17, 2012

New Note. New Note. *sigh* New Note.

I'm interested in what you can do with audio if you're wearing a headset/microphone. As a baseline, I want to see what you can do with off the shelf components already with minimal frustration. The test task is making a note without taking the phone out of my pocket.

(tl;dr: this does not work at all. I am surprised at how much this doesn't work. Geez, I figured voice commands were solved.)

I've tried the Plantronics M50 and the LG Tone. The M50 is a standard bluetooth headset, the Tone is worn around the neck and has detachable magnetized earbuds that you can put in your ears. First impressions:
- I feel like a businessman and a tool wearing the M50. With the Tone I feel like a regular person. This is important.
- Music streaming to the ears works right out of the box on both. This is surprisingly great, especially when biking.
- Sound quality in my ears seems fine on both, although I've only used them a couple days each. The M50 is a little bit quiet even on max volume for listening to music, but okay for calls. Can't say much about voice recording quality.

- the Google app that includes Voice Actions and Google Now (on Jelly Bean, I think) is so cool and so flawed and buggy. The cool: saying "note ____" records your voice and sends it to yourself in an email: both the audio file and an attempted transcription. No confirmations or anything. Exactly correct. However, when I associate the "bluetooth button" action with the Google Search app and press the button, nothing happens. And when I open the app on my own, sometimes it jumps right into recording, and sometimes it, says "initializing" forever and freezes. This is on my 2-year-old Nexus S, and the Galaxy S3 I'm working with doesn't have Jelly Bean. I'm guessing when they get the bugs worked out, this will be the reasonable way to talk to my phone.

- Voice Control (Full) works pretty well. Pressing the button on my bluetooth opens it and starts listening (IF you disable the Google app and then re-associate the "bluetooth button" action with Voice Control). I can say "Make a note" and it puts it in my Evernote account. The downside is that it's a 5-step process: "Make a note" What should the content be? "blah blah" Do you want to make a note with content 'blah blah'? "Yes" What should the title be? "title title" Do you want the title to be 'title title'? "Yes." Five steps is four steps too many (especially because all these steps can, and do, fail).

- Vlingo boasted at least somewhat-competent voice recognition but you can only do about 7 things, none of which I even want to do. (call people, text people, update your facebook status...)

Utter looks like it's headed in the right direction, but right now doesn't accept the button on the bluetooth to trigger it.

- Samsung's S Voice: not so good. Pressing the button on my Bluetooth device starts it... sometimes... and sometimes it just opens the app so I can start it by pressing a button on the screen (which defeats the whole purpose of the bluetooth button). Also, when you're making a note, it asks you to confirm by saying "Save note"... and if it doesn't hear "save", it just hears "note", which throws away your old note and starts a new one. What!

- Skyvi ("Siri for Android") had poor voice recognition. Also, it doesn't support just taking notes. Am I the only one who never wants to call people if it's not very reliable that the service will call the right person?

- Iris ("Siri backwards") just didn't work on my Nexus S, and on the Galaxy S3, it looks like it's a press-an-on-screen-button app. (with lots of annoying advertising for some other app too.)

Finally, a couple of thoughts Absolutely Correct Ideas about how voice commands should work, based on a day of futzing with them:
- start on bluetooth button press. If I take my phone out of my pocket, you've lost me. Ideally ideally, we'd be starting on a wake-up word, but I assume the battery life isn't there quite there yet.
- don't rely on correct transcription, when possible. (taking notes shouldn't rely on correct transcription.)
- corollary: don't ask me to confirm stuff, unless it relies on correct transcription (like calling a person). I should say things once, maybe twice.