no blocking - /eva is designed to be interrupted. in general, she only speaks short amounts of text, but a longer example is speaking the weather. so you can speak another command, she will stop mid sentence, and then respond to your next command.
avoid dialogue - for now, i'm avoiding dialogue entirely. eva doesnt even ask for a confirmation. so everything is a single command. if that command is misrecognized, you can either speak it again or disable the previous action. my usage behavior has changed over time because of this, and because recognition is excellent. so i pick up the mic, turn it on, speak the command, turn if off, and set the mic down before Eva even has a chance to respond to the command. also, if Eva doesnt have a high confidence value, she will repeat what was recognized, which lets the user pay a little more attention to what happens next.
reduce spoken commands - its all about speaking, but speaking the same thing over and over can get annoying. an example is Eva started out with traditional music controls (next, previous, etc). but it got ridiculously annoying saying 'next track'. so that spawned the commands 'start scanning music', 'play similar', and 'search tracks for'. those commands now reduce the amount of talking that i have to do. now 'next track' is not used to search for music, but only to skip past a track i dont want to hear.
notify the user - the goal is to avoid having the user say something, and then just sitting there not knowing what is happening. examples are eva will beep if there is going to be a potentially long running operation like a music search or web call. if speech is misrecognized, that is signaled with 3 quick beeps. if recording is taking place, or a voice note is waiting, there are periodic beeps for that.
alternate / optional grammar elements - i start out with very strict grammar commands. but there are something like 125+ commands now, so a user cant really be expected to remember those word for word. as a command is misrecognized, because i say it slightly different, i've been going back and adding those alternate or optional words to the grammar. that is where the voice commands that i think up while coding have the potential to be slightly different than what i say naturally as a user.
find something - if i make a search request for music, then some music needs to start playing. Eva really works to avoid the 'no tracks were found' scenario. this is done by building the search with tons of failover. multiple search requests are performed, starting with a strict search, and then gradually getting less strict until a matching track has been found.
avoid false positives - recognition is excellent, so Eva generally does what i say. periodically, Eva rejects what i say, and signals with the 3 quick beeps. the more annoying scenario is a false positive, where something is mis-recognized and Eva performs an unintended action. some of this is helped by disabling other grammars in certain scenarios. e.g. i've changed it so that when a voice note is being recorded, only high priority grammars are active and the rejected audio beeps do not play. if a false positive does keep happening, then i go and change the wording of one of the grammars to make the phrase be more distinguishable.
recognizer sticking - dont currently have a fix for this one. rarely, about once a week so far, the speech recognizer seems to get in a bad state where it misrecognizes something, and then keeps misrecognizing that phrase repeatedly. its like the weighting gets into some bad state to favor that phrase no matter how articulate i speak the command. it might get stuck for 3 to 5 repititions. highly annoying. i'm thinking about keeping a running list of the last # recognitions, and if it repeats for 3 times, try to do something to 'kick' the recognizer out of its bad state. then i'd have to ignore this for some commands, like 'next track', which could legitimately be spoken repeatedly.