/dicResKit : Dictation Resource Kit
Using the Dictation Resource Kit
the Dictation Resource Kit (DRK) is an MS tool that allows you to create custom language models. a custom language model allows you to add your own words to SAPI to work with speech recognition (dictation) and speech synthesis. e.g. you could add medical / legal / klingon / other industry specific terms. these language models can be used with Speech Recognition of the OS along with custom applications. and the custom language models can be used by Vista and Windows 7. this article will provide an overview of working with the DRK.
there are 3 steps involved with creating a custom language model : Normalization, Generation, and Compilation. as a (poor) example, this article will show how to create a language model for music genres.
the DRK must be installed. for now it only supports a limited number of languages.
first, we need a txt file of the words that will be used as input.
NgramGenre.txt - this file was generated by reading the ID3 tags from a bunch of mp3 files. notice that this list only has unique entries. if you want the end result to put more weight on the most popular genres, then you would not remove duplicate entries.
the normalization process will now parse this txt file and replace abbreviations, symbols, numbers, etc. with their expanded word form. e.g. '0' will become 'zero' and '&' will become 'and'
the DRK is driven by an XML input file
Normalize.xml - options
you have to specify the 'corpus' input file. it also allows you to specify error and output logs.
NormCorpusIn.txt - corpus input file. it contains file pairs for the non-normalized input and the output path for normalized output. this example only has one input and one output, but you can list more input and output pairs.
running the normalization process will result in these 3 output files.
NgramGenreNormalized.txt - this is the output file with the normalized words.
NormLog.txt - empty
NormErrLog.txt - empty
NOTE normalization is optional. also, i wish the text normalization function was built into System.Speech and Microsoft.Speech so that my custom apps could call it directly. this would allow my apps to perform normalization when the underlying data is being gathered and stored directly in the database for searching against. instead, i have to build the database, run the normalization process, and then write more custom code to map the normalization result back to the dataset for searching.
e.g. string normalizedText = System.Speech.StringUtil.Normalize("non-normalized text");
this step will generate the statistical language model from the list of normalized words. you will need another XML file to drive the process.
GenLM.xml - options
the same as Normalization, it must specify the corpus input file. it must also specify an SLM output file (statistical language model in binary ARPA format). you can also specify an ArpaLM file (text based statistical language model). NOTE there are other options that you can specify such as a list of words that must be included or excluded in the resulting vocabulary.
running the generation process with the config file above will result in 4 output files.
GenLMSlm.slm - binary ARPA format
GenLMArpalm.txt - text ARPA format
GenLMLog.txt - output log
GenLMErrLog.txt - empty
NOTE being able to generate a statistical language model in (binary and text) ARPA formats is cool. at this point, i wish the DRK allowed us to compile this result directly into an n-gram .cfg file which could be used as a System.Speech (desktop) or Microsoft.Speech (server) grammar. these grammar files could also be referenced by command-and-control grammars.
e.g. Grammar g = new Grammar("myN-Gram.cfg");
this step will compile the statistical language model into a format that is usable by System.Speech. so we need another XML options file.
the input is the .slm file result from the Generation step. you can optionally specify a base language model (i.e. English). so this would combine the newly created industry-specific language model with the default language model. it can also take a dictionary file as input.
Dictionary.txt - optional Dictionary file for specifying word pronunciations and capitalization.
running the options file above will result in 5 output files.
CompLMDictInfo.txt - this file contains the pronunciations used by the language model. you will only get this output file if you are not using a base model.
CompLMLog.txt - output log
CompLMErrLog.txt - error log. this shows that some of the input words did not generate a pronunciation
Genre.dlm - binary language model format used by SAPI
Genre.ngr - binary n-gram format used by SAPI
the final step is to register the language model (.dlm and .ngr files).
Register.txt - adds registry keys to setup the language model as a dictation topic.
NOTE i wish registry keys were not needed because it requires admin privileges.
now that the language model is created and installed ... now we can use it.
if you are using 'Speech Recognition' for the OS, you can select your custom language from Speech Recognition - Dictation Topic - 'select your topic'. you would probably only do this if your custom language model also included the base language model. NOTE the pic below does not show any custom topics.
you can also use the compiled language model in your own custom applications.
SpeechRecognitionEngine sre = new SpeechRecognitionEngine(); sre.SetInputToDefaultAudioDevice(); sre.RecognizeCompleted += new EventHandler
(sre_RecognizeCompleted); //string topic = "grammar:dictation"; //string topic = "grammar:dictation#spelling"; //string topic = "grammar:dictation#HowDoI"; //string topic = "grammar:dictation#URL"; //string topic = "grammar:dictation#Pronunciation"; string topic = "grammar:dictation#Genre"; DictationGrammar dg = new DictationGrammar(topic); sre.LoadGrammar(dg); sre.RecognizeAsync();
for my own purpose, creating the options XML files got annoying, so i created a C# library to make the DRK a little easier to use. this allows me to automate re-generation of the language models periodically as the underlying data changes. its called like this :
DrkUtil.GenerateNgram(string "input_corpus_file_path.txt", string "TokenId", bool normalize, bool register);
the DRK is a useful tool to allow us to create custom language models to be used by the OS or within our own custom applications. of course i would like to see it extended to be a little more user friendly (i.e. a .NET library), to support more languages, and to create n-gram .cfg files.
C# source code for the helper library. you will also need to install the DRK itself.
probably some UCMA articles.