/tabletWeb

comment(s) 

Tablet Web Apps with Ink and Speech

http://www.brains-N-brawn.com/tabletWeb 8/20/2004 casey chesnut

Introduction

the release of XP SP2 includes a FREE update to the Tablet PC OS, raising it to Tablet PC 2005 Edition. this update greatly improves the usability of Tablet PCs. shortly after that release, the Tablet SDK 1.7 was released to allow developers to create applications that take advantage of these improvements. of particular interest to me is the ability to do what is called 'Ink-in-IE', meaning you can ink within Internet Explorer. this is done by embedding a WinForm control <object/> that uses the Ink API. so that gives us the ability to ink on a web page ... to trick it up even more ... throw in Speech. you can do ink and speech on WinForms pretty easily. this is demonstrated in the /tabletStrator article. Ink is done with the Tablet API and Speech is done through a COM wrapper of SAPI. for the web, you cannot use SAPI, so Speech is done through the Speech SDK and an extension to IE called the Speech add-in for Internet Explorer. to that end, this article will show how to do ink and speech on a web page! also to remind people that i used to be a Tablet MVP :) are there any Speech MVPs?

Out of the Box

you can use a Tablet PC to browse traditional HTML web pages. the experience is not too bad. you can click with the pen as you normally would, so most controls works great. the part that kind of sucks is when you need to enter data into a TextBox. it starts out by you placing your pen in a TextBox, which brings up a little icon (shown below)

if you click that icon then it brings up the PIP (pen in place) to let you enter data as ink (or virtual keyboard). the data will be entered into the TextBox and then you can run your google search.

that is fine for a single TextBox search, but gets cumbersome for a long form. for a long form you click on the TextBox, then the icon to bring up the PIP, enter your data, then click on the web page to close the PIP. you then repeat that scenario for every TextBox on the form. below will demonstrate how to provide a better user experience

Tablet

i actually did the 1st public implementation of ink-in-IE almost 2 years ago for the /tabletInk article. the main drawback of that implementation is that the Microsoft.Ink.dll did not have the AllowPartiallyTrustedCallers flag, so you had to increase security permissions to make it run at all. that has been reworked and now you can embed WinForm controls in IE 6.0 (and above) that use portions of the Ink API without requiring increased permissions. but before you can ink, you need to make sure your web page is being called by a Tablet PC.

first, Tablet PCs with SP2 will pass a User-Agent string that looks like the following. on the server you can parse this string to make sure it is a Tablet

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Tablet PC 1.7)

second, you need to make sure that .NET 1.1 is installed. .NET 1.0 is no good because it will not run embedded WinForms controls without the user having to increase permissions. this is also done on the server

Version verNeed = new Version("1.1.4322");
if(Environment.Version < verNeed)
//they dont have .NET 1.1 or greater

at this point you know they have a Tablet and .NET 1.1 so you can render HTML that will host embedded WinForm controls with <object/> tags. but you still need to do some more checking within the WinForm control itself

the third thing to check is that computer has the Ink controls that are used by the Control. e.g. if an InkCollector is used, then try to instantiate one within a try-catch block. if it cannot be instantiated, either show the Exception or render a normal WinForm TextBox instead. they wont be able to ink, but they will still be able to use the Popup Input Panel or keyboard if they have one. another way to check this is parsing the AssemblyName

the fourth check is to make sure that they have ink Recognizers installed. do this by instantiating the Recognizers object and checking that its Count property is greater than zero. this is for the case where a user might not have a Tablet PC, but are using a 3rd party digitizer (i.e. Wacom) and have the Tablet SDK installed. this would allow them to ink on the control using their mouse or digitizer, but the control would not be able to do any ink recognition

so that is a lot of different situations to handle. at one extreme you have a normal PC running any web browser ... at the other extreme is a Tablet PC 2005 running IE (Firefox will not host WinForm controls or the Speech add-in). the lowest common denominator is vanilla HTML, but you can provide an improved (and even richer) experience for the Tablet users

Ink

at this point we can begin inking on the WinForm control embedded in the page. the <object/> tag to embed the control might look like this:

<OBJECT id="inkPicture" height="175" width="300" border="1" classid="tabletWebControl.dll#tabletWebControl.InkPictureWeb" name="inkPicture" VIEWASTEXT>
	<PARAM NAME="Picture" VALUE="tabletWeb/casey3.jpg">
	<PARAM NAME="PenColor" VALUE="red">
</OBJECT>

the 'classid' is the control that will be loaded to run within IE. the PARAM elements are for passing parameters to the control through public properties the control exposes. the web page can also call public methods on the control using JavaScript. NOTE the control cannot communicate with the page. this can sometimes be done by the web page hooking events from the control, but it requires Full security ... and i'm not certain this still works with SP2? the only way to do this without additional security is to have the page keep polling a public property on the control for changes, and then have the page update itself with the new value from the property. also, i have not figured out a way for controls on the same page to communicate. e.g. a State control notifying a City control what City names it should limit itself too. the page would have to kick this off somehow, reading the value from the State control and then setting it on the City control. the Controls can also communicate back to the server that served them. the controls i have built have made WebClient requests to the server for image files, as well as making web service calls for data. there are other things the Control can do, such as isolated storage; and many more things that it cannot, such as access the general file system or pInvoke (due to security limitations)

at this point you might want to take a look at some ink-enabled web pages. NOTE for any WinForm control to load in the web page you will need IE 6.0 and .NET 1.1 installed. for ink-enabled controls to load you will need the Tablet SDK 1.7. it will install on non-Tablet PCs; in fact i develop on a regular XP box, and only test on a Tablet PC. this will let you interact with some of the ink-enabled controls using your mouse. the main thing missing is that it will not be able to do ink recognition. finally, for the full user experience you will need a Tablet PC with SP2. if you have a Tablet with SP2, then you do NOT need to install the Tablet SDK. if you do not meet these requirements, there are videos of it in use below

the above are all samples from the Tablet SDK 1.7. below are some custom stuff i've put together

hopefully the links above will have given you some ideas about the capabilities of ink-in-IE. but i still have not tackled the problem of the long form. to do this i created an ink control that was configurable as to what words it would recognize. you do this with the Ink API by specifying a Factoid. prior to SDK 1.7, there were about 30 Factoids ranging from recognizing numbers, currency, dates, etc... you can also supply WordLists which either steer the recognizer to one of those words, or force (coerce) the recognizer to only return one of those words as a match. SDK 1.7 extends the Factoid concept with Input Scopes. Input Scopes are the same concept as Factoids. SDK 1.7 comes with about 50, with some of them being duplicates of the original 30 Factoids. the real power introduced with Input Scopes is that it introduces a subset of regular expressions. this allows you to specify your own patterns for ink recognition!

so i made my ink control configurable by WordLists and Factoids. when a user inks on the control, it waits for 1 second after they are done, and then does ink recognition according to that WordList or Factoid. it then displays the result within the control. if there was no recognition, then the control is blank. if the user sees that the recognition was wrong, they can go back and start inking on the control again. the previous result will be erased and replaced with the new result. this will be demonstrated later ...

the images below show the ink control in action. it starts out with no input

yes, my handwriting is really this bad

yes, ink recognition really did figure out what i wrote (it used a WordList of 10,000 US city names)

click the microphone to begin speech recognition (used a Grammar with the same 10K US cities)

Speech

err, umm ... i created one of the 1st public implementations of a Speech-enabled web page using the Speech SDK too: /noHands. back to the topic ... inking on the web page is great, but we can also use Speech. this is accomplished through the Speech SDK and the Speech Add-in for IE. this creates what is called a multimodal application, meaning you can interact with it in many ways. either your keyboard, speech ... or ink. for the basics of creating a multimodal application, please see /speechMulti, which shows how to create a speech enabled web app for the Pocket PC (the concepts carry over to the desktop, minus the need for Speech Server)

to speech enable the web page, the page is primary and the ink control is only used to display the result. all i did was tie the onclick() event of a microphone image to a Listen control of the Speech SDK. when the image is clicked, the Speech Add-in for IE will use the Tablet PCs microphone to hear what the user speaks and do speech recognition on it. in the same way that the Ink API uses Factoids and Input Scopes to help improve recognition, the Speech SDK uses what are called Grammars. the Speech SDK uses the standard grammar format SRGS. it comes with about 130 grammars with some of them building upon others. it also has a powerful grammar design tool to let you specify your own. it is interesting that the Tablet Ink API has reserved an Input Scope to accept SRGS in the future! this is important because it will eventually allow Ink and Speech recognition to use exactly the same rules for recognition. oh yeah, once the text is recognized, it sets a property on the ink control to render the result for the user to check. the controls could be extended to do some validation on their own. also to be other types of controls (e.g. ListBox and DropDownList would be handy)

each ink control on the web page already had its own Factoid tied to it. so then i added a Listen control for each ink control, and set its Grammar to the corresponding rule on the speech side. so if the ink control was using the DATE factoid, then i used the DATE grammar that the Speech SDK provided. also made sure that they used the same WordLists. the City ink control reads an XML file containing the names of 10,000 US cities into its WordList. the City speech control uses a Grammar that has been transformed from that same XML file. so the City control will recognize the same cities by both ink or speech. this lets a user enter data how they feel most comfortable: by speaking or writing. either way is faster than a traditional HTML web page. what is great is that one type of recognition is sometimes better than the other. if it does not recognize your handwriting, then instead of trying to rewrite it and getting the same error again ... try speech the 2nd time and see if it has better luck recognizing your input. sometimes ink works better ... sometimes speech

the following link is to the ink and speech enabled web page. it has a long form of controls that you can interact with either by using the pen or your voice. NOTE the software requirements above. for the addition of Speech, you will need the Speech Add-in for IE installed on your machine. if you do not want to mess with all that, then there are some videos of it in use below

i got this error when installing the Speech Add-in on my Tablet PC with SP2. it said that i did not have the latest version of the speech recognition engine and asked if i would like to install the latest. i said yes, but then it failed to install because of the missing file below. if i started the install again, and told it not to update my speech reco engine, then it installed fine. after training it seems to work great

Video

the requirements to run this on your own are steep. i ran it on my 1st generation Fujitsu Tablet PC with SP2, IE6, and .NET 1.1. it also has the Speech SDK installed and trained. if you dont have that hardware or software, then you can see videos of it working here

Browsers

so that should compel you that ink and speech on the web are good. it just needs to become more accessible. 1st off is the Browser war. IE is the only browser that can host WinForm controls. i doubt any other browsers will start hosting them until / if the whole Eolas junk clears up. then we've got the whole cross platform issue ... does Mono support embedded WinForm controls? Mono definitely does not do Ink. as other Tablet PCs come out with non-MS OS's, it will make ink on the web (or anywhere for that matter) more of an issue. Speech on the web will become an issue too. right now MS is backing the SALT standard, while IBM likes VoiceXml. i'm just waiting for FireFox to add VoiceXml support for voice browsing. hopefully, the update to IE that is happening now will get the Speech Add-in out to more people. right now you can only get the Speech Add-in by downloading a developer SDK ... stupid. hopefully the IE update will give WinForm controls in IE a little more power too. being able to sent events to the page without extra security, and cross control communication would be great

Conclusion

i conclude that ink and speech are just plain cool. the simple page created here ends up being easy to work with by using ink or speech. being able to do both is just gravy. the technology for building the apps is golden ... but the ability for people to consume these apps is crap. SP2 is pushing ink-in-IE out to Tablet users, but Speech could be pushed out to a much larger number of users. so you can only deploy this stuff in a controlled environment right now

in the near future i see search and travel sites becoming ink-enabled. an ink-enabled google and expedia would be pretty slick

further out i see ink and speech recognition becoming seamless. the grammars / factoids will end up being the same for both. it would be neat if i could feed the Speech SDK a regular expression and have it generate an SRGS. hopefully Longhorn is working to make this easier

my ultimate vision is that these different modes of interacting with an app become one. the scenario is i am talking to the automated teller of my bank on my Smartphone as i walk back to my office. i dock the phone and it brings up a web browser of recent transactions, which i can interact with using the keyboard. the session is transferred from the voice-only call to the web browser, as well as security context. as i pay a bill online, it might prompt me for my signature, which i can tilt back my docked tablet and sign off on for a greater level of security using a signature biometric to verify that i really signed. some day ...

Source

not giving away the code with this one

Updates

no planned updates for this article. i do plan on continuing to ink-enable my web site, and speech enable it when that time comes

Future

had started to play with some MapPoint stuff ... but got stuck. waiting for an article to get me unstuck. supposed to be looking for work too