Speaker Verification Activity for Speech Server 2007
this is a real quick article about custom Speech Activities for Speech Server 2007. the end result will be a set of activities to perform very basic speaker verification. it will also develop some activities for performing dictation. a Speech Workflow using the speaker verification activities is pictured below.
MS Speech Server 2007 has a new workflow development model. this workflow model is composed of Speech Activities. the core activities are Statement (for speech synthesis) and QuestionAnswer (for speech recognition). there are also higher-level activities which provide functionality such as support for Menus. to learn more about the new speech workflow dev model, see the /speechTextAdv article.
Speech Server 2007 also provides the ability to create custom speech activities. e.g. you could create a custom speech activity for collecting credit card info, and then use that same activity in many different applications. to create a custom speech activity, you first create a 'Workflow Activity Library' project. that project will then need to reference the Microsoft.SpeechServer public assembly. finally, your custom activity must inherit from SpeechSequenceActivity.
to be consistent with the standard speech activities, your custom activity should probably fire a TurnStarting event. it should also expose properties, which will need to be an InstanceDependencyProperty to properly store state.
biometrics are a way to add security to an application. for telephony applications, voice biometrics make alot of sense. they are interesting in that are influenced physically and socially.
voice biometrics are mostly tied to the device on which a person enrolled with. this is because the audio can sound radically different between different phones. you might consider this hardware lock-in as additional security ('something you have'). also, your voice will change with age, so a person will have to re-enroll after some passage of time. finally, it can be faked with recordings. so it's not a good idea to use a voice biometric as your only security measure. it could easily be paired with entering a secret pin # using DTMF ('something you know').
there are 2 types of Speaker Recognition (below). NOTE do not confuse speaker recognition with speech recognition. speech recognition determines what was said, and speaker recognition determines who is speaking
this article will do speaker verification.
there are 3 different types of Speaker Verification :
this simple implementation is text-dependent. it is made up of 3 activities :
SpeakerVerifyTextDependRegister is pictured below.
it is composed of a RecordAudio and Code activity. when it starts up, it fires a TurnStarting event which allows you to set a prompt and user ID. the prompt will ask the user to speak a pass phrase which the RecordAudio will save to a wav file.
a 'pass phrase' might be speaking your name, telephone number, pin number, random phrase, etc... then, the code activity will do some basic audio processing to generate a voice print. first, it trims the silence.
this implementation parses the wav file and then performs a fourier transform to get frequency values.
the frequency values are divided up into cells, averaged, and then saved as a series of numbers.
133 142 121 134 134 110 125 136 139 154 130 155 119 115 133 148 150 121 143 141 112 122 134 136 134 ...
this series of numbers becomes the voice print and that is saved to a file associated with the users ID.
SpeakerVerifyTextDependVerify is very similar, but occurs when a user already has an existing voice print. this times it asks the user to repeat the pass phrase. it then does similar audio processing to generate a voice print for the current sample. finally, it compares the new voice print against the master voice print for the user. if they are similar, then it is considered a match and the speaker is verified.
anyway, consider this a "poor mans" implementation. it is just a proof of concept. there are many things that could be done to make it more secure. maybe MS Research will cook up a proper implementation?
the following are audio recordings of the speaker verification activities in use :
just for kicks, i also cooked up some activities for performing dictation. they work by recording a user with a RecordAudio activity, and then uses SAPI to perform dictation on the recorded file. i implemented it 2 different ways, using a RCW wrapper over COM and SAPI 5.1 (something like 5 years old), and also with System.Speech (.NET 3.0) which works with SAPI 5.1 on XP and SAPI 5.3 on Vista (currently in beta).
so i got this to work ... but it doesn't work very well. it's actually really bad with SAPI 5.1. i don't currently have Vista installed to try it with SAPI 5.3. speaker-independent dictation still needs some work
it's actually very easy to create custom workflow activities, including custom speech activities for Speech Server 2007. i was initially disappointed that Speech Server 2007 did not implement a voice biometric out of the box, but i'm no longer concerned now that i see how easy a 3rd party could implement that functionality.
possibly more speech stuff. later