VoiceXml and ASP .NET

http://www.brains-N-brawn.com/vxml 11/1/2001 casey chesnut

These are lessons learned from developing my 1st VoiceXML app. Had been wanting to develop a VXML app for a while. Got the idea for the app when a female dancer asked if I knew any jokes at a club club. Thus, the app tells knock knock jokes. The purpose is simple, but it demonstrates user input and dynamic content for output; and I picked up some tricks along the way developing in the .NET environment. Here is the scenario:

1) I dial a phone # and appropriate info to access the app:

1.1) BeVocal Phone # 1.877.33VOCAL (86225)
1.2) PIN 1234 (I typically key this in)
1.3) Account ID 666.1234 (I typically speak this out loud)

2) MENU (say 'joke' or press '2') (time/1 just reads off the system time of my server)
3) App says: Knock Knock
4) I say: Whose there?
5) App says: Boo
6) I say: Boo who?

6.1) BeVocal has some problems recognizing this line sometimes, so you can just say 'who' by itself, and it has no problems recognizing that
6.2) If the app just cannot recognize anything you are saying, you can say 'skip' to try a different joke
6.3) Also, you can say 'menu' to return to the main menu, and leave the joke section (although nothing else is there yet)

7) App says: Don't cry, it's only a joke!
8) * App Repeats at Step 2 (with a different joke) *


This is a recording of the steps above:

jokes.wma - 32 kbps - 500k - 2 min 10 sec

jokes.mp3 - 32 kbps - 500k - 2 min 10 sec

Voice Gateway

Chose BeVocal as my voice gateway. After signing up for a developer account, it provides free access to my application. Using their web-based tools I created a Hello World sample and was accessing it from my phone in less than 5 minutes. Every time I dial their toll free number, I have to enter my 4 digit pin and speak my id # (phone #) to get directed to my own application. Note: the DTD <!DOCTYPE vxml PUBLIC "-//BeVocal Inc//VoiceXML 1.0//EN" "http://cafe.bevocal.com/libraries/dtd/vxml1-0-bevocal.dtd"> is unnecessary, so I scrapped it from my apps to make it easier to port to other Voice Gateways in case I ended up being unhappy with BeVocal. http://www.bevocal.com

Web Browser 

Run my own .NET Web Server back in Dallas, so I FTP'd the Hello World sample over to it, and then pointed the BeVocal Gateway to the VoiceXML page (.vxml) on my own server ... so that I would have more control over the server for delivering dynamic content. Tested it on my phone, and that was setup in less than 10 minutes. Did not have to register any MIME types or anything to my IIS Server. http://www.mperfect.net/vxml/joke.vxml


My development framework of choice is C# and .NET, so the next step was tying that in. I threw together a simple Hello World .aspx page. Instead of HTML, all it contained was VoiceXML meta tags. So I worked with it in HTML view and CodeBehind views. Testing it out on BeVocal threw a parse error because their was a line feed before the XML Declaration. This was fixed by commenting out the XML Declaration <--! ?xml version= "1.0" ?--> in the .aspx page and adding a line the the .aspx.cs codebehind page to write the declaration out in the Response to remove the linefeed:

this.PreRender += new System.EventHandler(this.Page_PreRender); //Initialize

private void Page_PreRender(object sender, EventArgs e)
{ Response.Write("<?xml version=\"1.0\" ?>"); }

After that, I needed to be able to dynamically generate the VoiceXML content. My 1st attempt was to just add a Label server control and set it in the CodeBehind. This did not work because the Label rendered <span>text</span> which is not valid VoiceXML tag and was rejected by BeVocal. To get around this I bound method calls in the aspx page to the CodeBehind, and referenced the string result of the method call using this syntax in the aspx page: <%=MethodCall()%> http://www.mperfect.net/vxml/joke.aspx  joke.aspx is dynamically generated each time it is called


Now that I felt comfortable with the environment, I needed to figure out some VoiceXML syntax. Relatively simple, the only place I had problems was with the <grammar>'s. Grammars are used to verify user input. So that if you <prompt> a user with a question, the grammar declares what are valid responses. If the user says nothing, you can reprompt with a <noinput>. If their response does not match a grammar, you can reprompt with a <nomatch>. If the user says 'Help', you can give additional info with <help>. If the user gives a valid response, that is caught with a <filled>. For a multiple word response, it must be enclose like this (one answer). If there are multiple response possible, they should be enclosed like this [red blue orange]. And you can combine the preceding [(one answer) red blue orange], so that 4 responses are valid. I tried this trick too: (?whose there) ... which means it will except 'whose there' OR just 'there' ... except this would not work for me. Another tricky part was that do not use capital letters in the grammars. BeVocal gave a cryptic error when my grammar was (Whose there) as opposed to (whose there). So dealing with strings in .NET, I used .ToLower() on all strings that got rendered within grammars. Also, after a joke was read, I set it up to direct to itself using the <goto> statement. If I directed to the form itself <goto next="#form>, the page was cached on the Voice Gateway so the same joke would kept repeating. If I directed to the page instead, <goto next="joke.aspx">, then the page was retreived from the server each time to deliver dynamic content. Finally, I added a Main Menu, and the ability to 'skip' out of a joke in case it got stuck or something http://www.mperfect.net/vxml/menu.aspx  menu.aspx page is static, and could have just as easily been called menu.vxml


Thank you Motorola. BeVocal is GREAT for providing a free service, but their web-based tools just aren't sturdy enough for debugging. The best environment I found for this was the Mobile ADK from Motoroal, Beta 3, which provides a simulated Voice Gateway you can run off your own development machine.. After signing up for a developer registration, it can be downloaded for free. It also requires Java 1.3 runtime, a Motorola IDE which can be downloaded for free. It also installs MS Agent utilities, which I have used before at www.brains-N-brawn.com/agents . The link to download the Mobile ADK is: http://developers.motorola.com/developers/wireless/downloads/madk_download.html . It is great for debugging, since you dont have to use your phone, nor upload files to servers, etc ... so it saves alot of time. Had to add the type= "application/x-gsl" to the <grammar>'s but BeVocal just ignores these


Since I did not know any jokes for my dancer friend, I searched the net for some silly knock knock jokes. I then XML'd them in the form of <setup> and <punchline> elements. Then, I generated an XSD / XML Schema from this XML file, as well as a typed DataSet. The code to randomly read a joke from the XML file follows:

Vxml.jokes jds = new Vxml.jokes(); //create the dataset
string filePath = Server.MapPath(".") + @"\jokes.xml"; //set the filepath
jds.ReadXml(filePath,XmlReadMode.ReadSchema); //populate dataset
int count = jds.joke.Rows.Count; //determine # of jokes
Random r = new Random(DateTime.Now.Second); //seed random number generator
int chosen = r.Next(count-1); //choose a lucky joke
setup = jds.joke[chosen].setup.ToLower(); //set variable with setup part of joke
punchline = jds.joke[chosen].punchline; //set variable with punchline of joke

Setup and punchline are private members with public Getters that are called from the aspx page. Note the .ToLower() on the setup part. http://www.mperfect.net/vxml/jokes.xml


Went back to the club club for some quote-unquote testing in the real-world. Ended up that the club was basically too loud for use in that environment. So that I should use <dtmf> attributes when possible, which allow for keypad entries to be used when voice is not robust enough ... although this makes no sense for knock knock jokes :(


I hope VoiceXML applications become standard in .NET and that VS.NET supports their development in a friendlier manner. Speech .NET and Mobile .NET should be involved in this as well, they probably already are, and I just dont know it. Regardless, my next Voice app will probably involve this:

1) App says: What is your name?
2) You say: Steven
3) App says: Steven is a stupid name!