/eva : Eva Virtual Assistant

Speech-controlled Multi-zone Home automation

http://www.brains-N-brawn.com/eva 1/27/2009 casey chesnut

comment(s)

Introduction

EVA is a speech-based virtual assistant for the home with multi-zone capabilities. that sentence needs to be broken down :

another way to describe EVA would be 'Ford Sync : Car : : EVA : Home'

video
intro video

Software

EVA is a C# .NET application that runs on Windows Vista. it uses System.Speech for speech recognition and speech synthesis. DirectShow is used for working with microphones and to handle multi-zone audio output. the UI is only used for setup and debugging. after EVA is setup and running, then the GUI (graphical UI) is not needed and you interact with EVA using your voice. this is called a VUI (voice UI).

System.Speech is for desktop use and can be trained to be speaker dependent. that said, i have not trained the speech recognizer on my dev machine or on Eva's host computer. i do not want to train Eva to be speaker dependent, so that other people can use Eva as well. since Eva only does command-and-control (and not dictation), this seems to be working fine.

one limitation of System.Speech is that the public API only lets you select the default microphone device. to work around this, i have to use reflection to invoke a private method to tie each zone to its own microphone device. the other major limitation comes to creating n-grams for music. Microsoft does not provide a way to programmatically create n-grams, so i have to build them manually using Visual Studio. Eva updates the library of ID3 tags nightly used for searching, but i also need to update the n-grams nightly without having to deploy Visual Studio. hopefully Speech Server 2008 will provide this functionality. third, i also used the GRXML grammar builder and grammar library from Speech Server. the tools for Speech Server need to be broken out so System.Speech developers can use them without having to install Speech Server. finally, i ran into some bugs when building grammars that referenced external grammar libraries and used SemanticItems. since grammar development is so important, this bug needs to be fixed. at least System.Speech provides a number of ways to build grammars (GRXML as xml, GrammarBuilder object, SrgsDocument object), so it is possible to workaround these bugs ... just makes your code ugly.

this is what the GUI looks like (only used for setup and debugging)

Hardware

EVA runs on a single dedicated PC. this was a design goal to avoid having multiple PCs in each room. to support multi-zone audio, the PC has multiple USB sound cards. each sound card handles the audio input and output for a zone. in my setup, i have 3 sound cards for 3 zones (computer room, living room, bed room). this allows EVA to know where you are speaking from, and what zone or zones to send audio output to.

for audio input, i am using wireless rechargeable microphones. each zone gets its own microphone. you can think of the mic as being a remote control. instead of picking up a remote and pressing a button, you pick up a mic and speak a command. the original plan was to use omni-directional mics in each zone, but background noise was problematic, plus music had to be muted before a new command could be spoken. the directional mics dont have problems with background noise and can be used while music is playing.

for audio output, i am using PC speakers in each room. these are not wireless, so i have 75ft audio cables running to each zone.

for home automation, i am currently using X10 devices. i'm also using X10 radio frequency remote controls. each zone has 1 or 2 remotes. the remotes support basic audio controls (pause, stop, next, previous, start/stop scanning, increase/decrease volume). the remotes are useful for situations when i do not need to speak. i've had problems with the wiring in my home, so powerline X10 signals will only work within a room. also, there is a 1 to 2 second delay when using the X10 remotes. just long enough to make you think its not working ... and then it works. also, when turning on/off an X10 adapter, it makes this really annoying popping noise.

video
software and hardware video

Cost

the hardware costs were :

the big power drain is the PC. including the PC and 3 zones, i estimate it will cost $10 per month to run EVA. if X10 signals worked better in my house, the plan was to have EVA turn on/off the speakers as needed. regardless, using EVA cuts into scenarios where i had used my TV previously, so EVA will keep the TV off more, and the TV is a much bigger power drain than EVA.


Commands

the following sections detail the commands that you can speak to control EVA.

Clock

Timer

each zone has its own timer. so you can time how long it takes somebody to peel a potato while somebody else is timed for how long they last in the bedroom.

Count Down

i use it when boiling eggs and steeping tea. zone independent.

 Voice Note

voice mail for your home. er, um ... i live alone, so i use it for my grocery list.

video
voice notes video

Music Selection

EVA has tons of way to let you choose what music you want to listen to.

video
music video

Music List

some helper methods to let you 'browse' your music library. it just randomly selects a small list and speaks them back.

Music Controls

Audio Book

the music controls (above) also work for audio books, along with the following commands. multi-zone (below) works too, so you can have the audio book playing in multiple rooms.

video
audio books video

Multi Zone

by default, music plays to a single zone. this allows you to play different music in each zone. person A can be listening to techno in the bed room and person B can listen to rock in the living room. they each control their own music in their own zone.

but EVA also supports multi zone audio ... on the cheap. have you seen how expensive consumer electronic multi-zone audio devices are?!?!

what about speech in different zones? first, EVA only talks to the zone that spoke to her. if somebody is in zone A, and you ask a question in zone B, then she reponds to zone B. and 2 people can be in different zones and asking questions to EVA at the exact same time, and she will send the appropriate response to each zone. if music is playing in your zone, then EVA will pause the music, speak the result, and then unpause the music. if music is playing in a different zone, then that music will not be paused. there are some commands, that speak over the music and do not pause the music.

video
multi zone video

Intercom

microphone and speaker in each zone ... intercom!

video
intercom video

Voices

EVAs speaking voice can be changed on-the-fly

video
voices video

Home Automation

NOTE started out trying to get z-wave to work, but i had problems getting a z-wave SDK to receive an event directly from a remote. i need to revisit that code with the newer version of the SDK.

video
home automation video

Weather

weather info is provided by weather.com. weather.com provides a free API, so long as your provide links to their services. since EVA does not have a UI to provide links, EVA periodically speaks 'weather info provided by weather.com'.

Holidays

holidays are calculated using Jay Muntz's Dynamic Holiday Date Calculator.

Appointments

EVA will speak appointments from your gmail calendar. i enter my appointments into Outlook and then sync them to gmail. EVA can also speak a reminder before your appointment starts and / or at the actual start time of the appointment.

video
debug and appointments video

Alarms

EVA supports 2 types of alarms : custom and quick alarm. 'custom' alarms are specified in XML and are recurring alarms (e.g. every M-F at 7am). a custom alarm can have advanced actions mapped to it. 'quick' alarms are a 1-time alarm that can be set using your voice. a quick alarm plays a default beep for 1 minute, or until stopped.

my custom alarm was inspired by the movie 'Iron Man'. it starts out playing music. i can either stop the alarm by saying 'stop alarm', pressing a button on a remote keypad, or just letting it run for 5 minutes. after stopped, it speaks a greeting, the time, if the day is a holiday, upcoming holidays, the weather, appointments for the day, and then it plays an NPR news podcast.

video
alarms video

Volume

since EVA has alarm functionality, volume control is very important. on startup, EVA sets the levels on the sound cards so that the mic input and speaker output are 90 and they are not muted. it also warns the user if one of the sound cards is the default audio device, in which the main volume level or hardware volume setting will effect it. the goal is that you set the volume on your speakers once, and then you never change that volume. the initial volume should be at a level in which you can hear EVA speak. then music volume and audio book volume levels are adjusted in software. so if you are going to bed, you can turn down the music volume real low, but when an alarm sounds the music volume will be increased so that it can be heard.

Debug

since you dont have a UI, these commands can come in handy to let you know what just happened

Math

basic math with support for integers, floating point, and fractions. EVAs response will speak the operators and operands so you can be sure that they were recognized correctly.

Unit Conversion

basic conversions for temperature, volume, mass, time, speed, length, etc...

Movie Quotes

just for fun, EVA can assume the role of your favorite human-killing computers. the response plays the appropriate recorded quote from the movie

Web

simple web searches

Random

more silliness

What Can I Say

i wrote them ... but there is no way i can remember all these commands. so you can ask EVA for what she is listening for.

video
math, what can i say, movie quotes, random video


Conclusion

the goal for the first version was to develop a starting framework and add enough features to compel me to deploy EVA throughout my house. that goal was met, because EVA is now my primary alarm clock and i use the music functionality alot. Eva has kicked out all the other electronics from my bedroom. now i actually check the weather before i leave the house, plus i'm starting to listen to audio books again. i want the keypad remotes to work better, so i'll probably try swapping out X10 for z-wave next.

Source

sorry, this is still my pet project, so no source code or binaries

Updates

planned

Future

i'm going to keep working on this. there are a ton of features i could add. later