EVA is a speech-based virtual assistant for the home with multi-zone
capabilities. that sentence needs to be broken down :
- speech based - EVA is commanded through speech, and communicates
using speech and other audio
- virtual assistant for the home - provides features for a home environment. e.g.
music playback, alarms, home automation ...
- multi zone - it can be setup to work with multiple rooms (aka zones)
individually or together.
another way to describe EVA would be 'Ford Sync : Car : : EVA : Home'
EVA is a C# .NET application that runs on Windows Vista. it uses System.Speech
for speech recognition and speech synthesis. DirectShow is used for working with
microphones and to handle multi-zone audio output. the UI is only used for
setup and debugging. after EVA is setup and running, then the GUI (graphical UI) is not needed
and you interact with EVA using your voice. this is called a VUI (voice UI).
System.Speech is for desktop use and can be trained to be speaker dependent.
that said, i have not trained the speech recognizer on my dev machine or on
Eva's host computer. i do not want to train Eva to be speaker dependent, so that
other people can use Eva as well. since Eva only does command-and-control (and
not dictation), this seems to be working fine.
one limitation of System.Speech is that the public API only lets you select the
default microphone device. to work around this, i have to use reflection to
invoke a private method to tie each zone to its own microphone device. the other
major limitation comes to creating n-grams for music. Microsoft does not provide
a way to programmatically create n-grams, so i have to build them manually using
Visual Studio. Eva updates the library of ID3 tags nightly used for searching,
but i also need to update the n-grams nightly without having to deploy Visual
Studio. hopefully Speech Server 2008 will provide this functionality. third, i
also used the GRXML grammar builder and grammar library from Speech Server. the
tools for Speech Server need to be broken out so System.Speech developers can
use them without having to install Speech Server. finally, i ran into some
bugs when building
grammars that referenced external grammar libraries and used SemanticItems.
since grammar development is so important, this bug needs to be fixed. at least
System.Speech provides a number of ways to build grammars (GRXML as xml,
GrammarBuilder object, SrgsDocument object), so it is possible to workaround
these bugs ... just makes your code ugly.
this is what the GUI looks like (only used for setup and debugging)
EVA runs on a single dedicated PC. this was a design goal to avoid having
multiple PCs in each room. to support multi-zone audio, the PC has multiple USB
sound cards. each sound card handles the audio input and output for a zone. in
my setup, i have 3 sound cards for 3 zones (computer room, living room, bed
room). this allows EVA to know where you are speaking from, and what zone or
zones to send audio output to.
for audio input, i am using wireless rechargeable microphones. each zone gets
its own microphone. you can think of the mic as being a remote control. instead
of picking up a remote and pressing a button, you pick up a mic and speak a
command. the original plan was to use omni-directional mics in each zone, but
background noise was problematic, plus music had to be muted before a new
command could be spoken. the directional mics dont have problems with background
noise and can be used while music is playing.
for audio output, i am using PC speakers in each room. these are not wireless,
so i have 75ft audio cables running to each zone.
for home automation, i am currently using X10 devices. i'm also using X10
radio frequency remote controls. each zone has 1 or 2 remotes. the remotes
support basic audio controls (pause, stop, next, previous, start/stop scanning,
increase/decrease volume). the remotes are useful for situations when i do not
need to speak. i've had problems with the wiring in my home, so powerline X10
signals will only work within a room. also, there is a 1 to 2 second delay when
using the X10 remotes. just long enough to make you think its not working ...
and then it works. also, when turning on/off an X10 adapter, it makes this
really annoying popping noise.
software and hardware video
the hardware costs were :
- old Toshiba P4 notebook $???
- wireless rechargeable microphones (4 channel) $340 (2 channel) $210
- X10 CM11A $19
- the cost for each zone is approximately $110, including :
- PC speaker $50
- USB sound card $20
- 75ft speaker cable $7
- XLR-1/8 mic cable $8
- X10 receiver $16
- X10 wireless remote $8
the big power drain is the PC. including the PC and 3 zones, i estimate it will
cost $10 per month to run EVA. if X10 signals worked better in my house, the
plan was to have EVA turn on/off the speakers as needed. regardless, using EVA
cuts into scenarios where i had used my TV previously, so EVA will keep the TV
off more, and the TV is a much bigger power drain than EVA.
the following sections detail the commands that you can speak to control EVA.
- What time is it? - my eyes are getting too bad to read some of the clocks in my
- What day is today? - i can never remember the date.
each zone has its own timer. so you can time how long it takes somebody to peel
a potato while somebody else is timed for how long they last in the bedroom.
- start timer
- [pause / unpause] timer
- read timer - speaks how long the timer has run
- stop timer
i use it when boiling eggs and steeping tea. zone independent.
- count down for # minutes / seconds (e.g. count down for 2 minutes)
- [pause / unpause] count down
- read count down - speaks how long the count down has left
- stop count down
voice mail for your home. er, um ... i live alone, so i use it for my grocery
- start voice note - after you say 'start voice note' the microphone is recording
everything you say, until you say 'stop voice note'.
- stop voice note - stops recording
- check voice note - speaks how many voice notes there are
- play voice note - plays the recorded audio from the last voice note
- about voice note - speaks the time and date when the voice note was recorded
- [next / previous] voice note (e.g. next voice note) - lets you navigate through
multiple voice notes
- replay voice note - replays current voice note
- delete voice note
- notify voice note - this plays 2 beeps every 30 seconds, to notify somebody that
a new voice note is available. so you would record your voice note (e.g. 'start
voice note' 'i'm going to workout' 'stop voice note', and then turn on the
notification by saying 'notify voice note' as you walk out the door. when they
arrive home, they will hear the beeps and know to say 'play voice note' to hear
voice notes video
EVA has tons of way to let you choose what music you want to listen to.
- play music - this automatically discovers the top genres in my music library,
then it randomly chooses 1 or 2 of those genres and plays shuffled files from
those genres. so each time you say 'play music' the music selected will be
- play [artist / genre / song / album] X (e.g. play artist the prodigy) - these
searches are based off of ID3 tags.
- play year X (e.g. play year 2008) (e.g. play year eighties) - so it will play a
single year or a decade.
- play year last X years (e.g. play year last 3 years)
- play year [before / after] X (e.g. play year after 2000)
- play file X - searches based off of file name, for when your ID3 tags aren't
very good. this also helps if you dont know the entire name of the song. just
say the artist and a single word from the song title. (e.g. play file moby god
- play folder X (e.g. play folder fiona apple) - my music is sorted by folders of
artist names. it will play all the files within that folder.
- play playlist (e.g. play playlist favorites)
- play stream X (e.g. play stream C 89.5) - stream URLs are stored in an XML file
- play recorded streams from X (e.g. play recorded streams from C 89.5) - EVA can record streams ... so this lets you play those
- play podcast X (e.g. play podcast hansel minutes) - it will check the RSS,
download the latest podcast (if not cached), and start playback
- play video X (e.g. play video destination calibria) - i have alot of music
videos. this function will let me play the audio from those video files, as if
it were just another mp3 file.
- play keyword X - for hard to find files. this searches everything : ID3, file
name, folder name. this also searches both audio-only and music video files.
- play random - this randomly chooses 100 songs to play from your music library.
- play mixes - randomly chooses a 'mix' (based on a long duration). i listen to
alot DJ sets.
- play similar - based on the currently playing song, EVA will perform a series of
searches looking for similar music (artist and genre, artist ID3 tag, genre,
artst file name, folder).
- play similar [artist / genre / folder] (e.g. play similar artist) - this lets
you specify which play similar search you want to be performed.
- play loop - based on the currently playing song, it will play the song over and
over again. i like to listen to the same song over and over again while coding.
- play audio book X - (see audio book section below)
some helper methods to let you 'browse' your music library. it just randomly
selects a small list and speaks them back.
- list [artists / genres / albums / titles]
- list [files / folders]
- list [playlists / streams]
- list audio books
- list videos
- list podcasts
- list popular - based on the set of currently playing files, it tries to
determine the most 'popular' songs by how similar the titles are. it then speaks
the titles that show up more than once
- [next / previous / random] track (e.g. next track)
- replay track - restarts the currently playing song
- stop music
- [pause / unpause] music
- skip music to # percent (e.g. skip music to 50 percent) - all the skip features
are useful for long mix tracks and audio books
- skip music to # minutes (e.g. skip music to 3 minutes)
- skip music [forwards / backwards] # minutes (e.g. skip music forwards 1 minute)
- what track is this? - speaks the ID3 metadata about the track
- music shuffle [on / off]
- music repeat [on / off]
- [start / stop] scanning music - this will 'scan' through the current set of
files. each file will start at the 1 minute mark and play for 15 seconds. so a
scenario to browse music is to say 'play random' and then 'start scanning music'
and when you get to a song you like say 'play similar'
- [start / stop] recording stream - if a stream is playing, then you can record it
to file, which can then be played back later
- sort music tracks - this is similar to 'list popular'. it tries to find matching
song titles, and then those are resorted to the front of the currently playing
- search tracks for X - this is cool. it lets you search for a song based on your
currently playing files. e.g. you start out by saying 'play artist prodigy'.
then you can say 'search tracks for firestarter' and the song 'firestarter' will
be resorted to the front of the list. now for the tech part, it starts out
trying to recognize your speech from your entire music library (say 10,000
songs). if that fails, it generates a dynamic grammar (what the speech
recognizer uses) from the list of currently select songs (say 100 'prodigy'
tracks), and then it re-runs speech recognition on that smaller grammar. so the
speech recognizer has a much better chance of recognizing what you said from the
list of 100 songs vs the list of 10,000.
the music controls (above) also work for audio books, along with the following
commands. multi-zone (below) works too, so you can have the audio book playing
in multiple rooms.
- list audio books - lists random audio books in mp3 or wma format
- list authors
- play audio book X (e.g. play audio book freakonomics) - will start playing the
book. if you have already started listening to the book, then it will auto
resume from where you stopped listening.
- search audio books for X (e.g. search audio books for dune)
- [pause / unpause] book
- close book - auto records where you stopped listening
- restart audio book
- add bookmark - this will add a bookmark that you can return to later
- open bookmark # (e.g. open bookmark 1)
- play book [faster / slower / normal] - this adjust the playback speed, in case
the narrator reads too slow or too fast
- how long is the book?
- how long have i listened?
- what page am i on? - this speaks the percentage of the book that you have
- what books are opened? - the most recent books that have been played
- skip audio book to # percent (e.g. skip audio book to 50 percent) - audio books
will be a set of files; so it has to determine which file to skip to and where
in that file to start.
audio books video
by default, music plays to a single zone. this allows you to play different
music in each zone. person A can be listening to techno in the bed room and
person B can listen to rock in the living room. they each control their own
music in their own zone.
but EVA also supports multi zone audio ... on the cheap. have you seen how
expensive consumer electronic multi-zone audio devices are?!?!
- transfer music here - this lets you move music that is already playing to your
current zone. e.g. you started music in the bed room by saying 'play music' then
you went to the living room and want to listen to the same music there by saying
'transfer music here'. the music will stop playing in the bed room and start
playing in the living room.
- transfer music here from [zone name] (e.g. transfer music here from bed room) -
if there is different music playing in more than one zone, then you will have to
specify which zone you want the music to be transferred from.
- music zone [on / off] - this lets you have multiple zones playing the same
music. say you are cleaning your house and keep walking back and forth between
rooms. you can start music playing in the bed room by saying 'play music' and
then each room you want the same music to play in, you just say 'music zone on'.
if you use the music controls (e.g. 'next track'), all zones stay in sync.
- music zone on from [zone name] - if more than one zone is playing different
music, then you have to specify which zone you want to join.
- music party mode [on / off] - EVA assumes you are lazy (because i'm lazy). if
you select different music to play, then all other zones are turned off by
default. to bypass this, turn music party mode on. then when you select new
music, the new music will be played to all the zones in the group.
what about speech in different zones? first, EVA only talks to the zone that
spoke to her. if somebody is in zone A, and you ask a question in zone B, then
she reponds to zone B. and 2 people can be in different zones and asking
questions to EVA at the exact same time, and she will send the appropriate
response to each zone. if music is playing in your zone, then EVA will pause the
music, speak the result, and then unpause the music. if music is playing in a
different zone, then that music will not be paused. there are some commands,
that speak over the music and do not pause the music.
multi zone video
microphone and speaker in each zone ... intercom!
- intercom mode [on / off] - ties your mic to every speaker (but your own). ties
every mic (but your own) to your speaker. the other rooms hear what you say, and
you hear what they say.
- god mode [on / off] - ties your mic to every speaker (but your own). you speak,
and everybody has to listen.
- spy mode [on / off] - ties all mics (but your own) to your speaker. you can hear
what is said from the other rooms, but they cant hear you. this would be more
interesting with always-on omni-directional mics.
- echo mode [on / off] - ties your mic to your speaker. this is only useful for
debugging, to check volume levels and for any noise on the line.
- X mode [on / off] to [zone name] (e.g. intercom mode on to bed room) - for each
command, you can optionally specify a zone name, so only your zone and the
specified zone will be involved.
EVAs speaking voice can be changed on-the-fly
- what voices are installed?
- change voice to X (e.g. change voice to Microsoft Lili)
- demon mode [on / off] - yep, this makes her speak backwards. i couldn't resist
:) might be fun on halloween and to torment babysitters
- list X10 devices - lists all the named X10 devices
- turn [on / off] X (e.g. turn off lamp) - it will attempt to turn off the lamp in
your zone first, if your zone does not have a lamp, then it will turn off the
lamp in another zone
- turn [on / off] X in [zone name] (e.g. turn off lamp in bed room)
NOTE started out trying to get z-wave to work, but i had problems getting a
z-wave SDK to receive an event directly from a remote. i need to revisit that
code with the newer version of the SDK.
home automation video
weather info is provided by weather.com. weather.com provides a free API, so
long as your provide links to their services. since EVA does not have a UI to
provide links, EVA periodically speaks 'weather info provided by weather.com'.
- what is the current weather?
- what is the weather [today / tonight]?
- what is the weather in # days?
- what is the weather on [day of week]?
holidays are calculated using Jay Muntz's
Holiday Date Calculator.
- what is the next holiday?
- what are the next holidays?
- how long until X? (e.g. how long until new years day?)
EVA will speak appointments from your gmail calendar. i enter my appointments
into Outlook and then sync them to gmail. EVA can also speak a reminder before
your appointment starts and / or at the actual start time of the appointment.
- what are my appointments [today / tomorrow]?
- what are my appointments on [day of week]?
- what are my upcoming appointments?
debug and appointments video
EVA supports 2 types of alarms : custom and quick alarm. 'custom' alarms are
specified in XML and are recurring alarms (e.g. every M-F at 7am). a custom
alarm can have advanced actions mapped to it. 'quick' alarms are a 1-time alarm
that can be set using your voice. a quick alarm plays a default beep for 1
minute, or until stopped.
my custom alarm was inspired by the movie 'Iron Man'. it starts out playing
music. i can either stop the alarm by saying 'stop alarm', pressing a button on
a remote keypad, or just letting it run for 5 minutes. after stopped, it speaks
a greeting, the time, if the day is a holiday, upcoming holidays, the weather, appointments for the
day, and then it plays an NPR news podcast.
- when is the next alarm?
- what alarms are set?
- set quick alarm at [time] [today / tomorrow / day of week] (e.g. set quick alarm
at 10 am saturday)
- set quick alarm in # [minutes / hours] (e.g. set quick alarm in 8 hours)
- disable quick alarm - there is only one quick alarm per zone, so if EVA
misrecognized what you said, you can just speak it again or disable it
- snooze for # minutes (e.g. snooze for 30 minutes) - this will stop the alarm and
reset it to trigger again later.
- stop alarm
- run morning alarm - this lets me kick off the 'daily update' from stopping a
morning alarm whenever i want. i use this whenever i wakeup on the weekends,
without an alarm.
since EVA has alarm functionality, volume control is very important. on startup,
EVA sets the levels on the sound cards so that the mic input and speaker output
are 90 and they are not muted. it also warns the user if one of the sound cards
is the default audio device, in which the main volume level or hardware volume
setting will effect it. the goal is that you set the volume on your speakers
once, and then you never change that volume. the initial volume should be at a
level in which you can hear EVA speak. then music volume and audio book volume
levels are adjusted in software. so if you are going to bed, you can turn down
the music volume real low, but when an alarm sounds the music volume will be
increased so that it can be heard.
- [start / stop] listening - this allows you to turn a zone on or off
- are you listening?
- what is the music volume? - speaks the volume level as a percentage
- music volume # (e.g. music volume 50) - to set the music volume
- [increase / decrease] volume - changes the music volume by 5% increments
- go to sleep in # minutes (e.g. go to sleep in 30 minutes) - for playing music at
night. the music will stop after the specified duration
- can you hear me now? - couldn't resist. this actually returns a numeric
value for the volume level sampled from the mic.
- mic check 1 2 1 2 - EVA will playback the audio so you can hear what the
microphone level is
since you dont have a UI, these commands can come in handy to let you know what
- repeat what you said - replays the audio that she just spoke.
- repeat what you heard - replays the audio that i just spoke
- what did you hear? - speaks the text that was recognized from what i just spoke
basic math with support for integers, floating point, and fractions. EVAs
response will speak the operators and operands so you can be sure that they were
- what is # [plus / minus / times / divided by / to the power of] # (e.g. what is
.2 times one fourth)
basic conversions for temperature, volume, mass, time, speed, length, etc...
- how many [unit a] in # [unit b]? (e.g. how many cups in 1 gallon)
- convert # [unit a] to [unit b] (e.g. convert 200 pounds to kilograms)
just for fun, EVA can assume the role of your favorite human-killing computers.
the response plays the appropriate recorded quote from the movie
- how about global thermonuclear war?
- open the pod bay doors
- ... i need to add alot more of these
simple web searches
- what is todays woot?
- what is the word of the day?
- what is the quote of the day?
- flip a coin - in honor of The Dark Knight
- random number - responds with a number between 1 and 100
- magic 8 ball - had to be done
What Can I Say
i wrote them ... but there is no way i can remember all these commands. so you
can ask EVA for what she is listening for.
- what can i say? - EVA will respond with the names of the different feature sets
(e.g. Alarms, Music Selection, Music Controls, Timer). so this is the help
systems main menu.
- what can i say about [feature set]? (e.g. what can i say about music controls?)
- the response will be all the different commands for that feature set (e.g.
next / previous track, ...)
math, what can i say, movie quotes, random video
the goal for the first version was to develop a starting framework and add enough
features to compel me to deploy EVA throughout my house. that goal was met, because
EVA is now my primary alarm clock and i use the music functionality alot. Eva
has kicked out all the other electronics from my bedroom. now i actually check
the weather before i leave the house, plus i'm starting to listen to audio books
again. i want
the keypad remotes to work better, so i'll probably try swapping out X10 for
sorry, this is still my pet project, so no source code or binaries
i'm going to keep working on this. there are a ton of features i could add.