Tablet PC SDK (Digital Ink) + SAPI Speech SDK 5.1 + WindowsMedia Player

http://www.brains-N-brawn.com/tabletStrator 11/13/2002 casey chesnut

People ask "why do you write articles and give away code ... for FREE!", I think for a good long while, and then say "i dont know, occasional insomnia and a complete and utter hatred for television". But then this comes to a head whenever my server dies, lose my internet connection, have to buy some hardware to develop something new, etc ... and so it happened the weekend after the Tablet PC launch ... my beloved DEVBOX crashed and burned. It is non-recoverable (finishing this on my notebook) and i am really sad ... sniff, sniff. Cannot tell you how many tech previews (alpas), beta, and release candidate software that box chewed up without having to be reformatted every time like the docs always say (VoiceXml Beta IDEs, .NET Tech Preview, B1, B2, RC, Mobile .NET B1, B2, .NET MyServices Tech Preview, MapPoint .NET B1, CF .NET Tech Preview, B1, XQuery Tech Preview, Speech .NET B1, B2, SQL Notification Services B1, WSDK Tech Preview, Tablet PC B1, ... at least those are the ones that I remember because they made it to article form, there are many more that did not make the cut) it even feels colder in my apartment without the heat produced by the ever crunching CPU, not to mention the silence without the additional cooling fans and harddrives spinning. Pittsburgh has been h3ll on my hardware. Came up here with a dual P3-933, dual P3-866, and a 1 gig notebook. Leaving with a single P3-933, 1 gig notebook, and a 233 Pocket PC. Minus 2 gigs of power :( Luckily Dallas is a candyland for computer parts stores ... so think DEVBOX2


Whenever a new API or SDK comes out, the scenario is: download, read the docs, run the samples, and then come up with an idea. Have to code to something to learn it, I cant just RTFM. Coming up with a compelling idea to code is the hardest. My idea methodology is to apply brute force tactics by saturating my conscious thought trying to apply it to common tasks throughout the day. i.e. /vxml - i didnt know any jokes to tell a stripper /myVoices - lost my phone with contact #s on it /clubMap - new to pittsburgh and lonely /strongerTds - used DataSets extensively on contract /noMadMap - lost in pittsburgh /noHands - just for kicks /noSink - more kicks /gutenberg - not enough ebooks for my PPC /bioSign - lost my phone again /wsdk - to reenter the web service namespace. For the Tablet PC, I was rattling ideas off, but nothing that cool. Until I was at the gym, glancing at a football game on the TV, and saw them go to work describing a play by drawing lines all over a paused frame (Madden style). Digital Ink! Get back home to call a friend that owns a TV to find out that it is called the TeleStrator. So this article will detail how I create the TabletStrator for annotating video on the Tablet PC using Ink, Speech, and Windows Media Player

i want one (or two)

Tablet PC Preview

The PGH MS offices had a sneak preview of Tablet PCs a couple weeks before the November 7th party. Went there specifically to find out about Speech support and ended up leaving as a believer in Ink. The following is a slightly cleaned up version of the notes that I took at that event:
There are 2 flavors: Slate and Convertible. Difference being convertible has a keyboard.More expensive, but more flexible for travelers.Acer has 2 convertibles, and they are shipping now.They played up digital Ink and did not play up Speech capabilities.The Ink part does not require training as Speech does.It does recognition on your handwriting and the file format keeps both,so you can do file searches across your handwritten notes (powerful).The pen has some magnet or digitizer or something on it so if you lose your pen(s),you cant operate it with your finger like you can a Pocket PC.So saturate your environments with pens.The sample rate from the pen is real high too, and even detects pressure.Think enhanced signature biometric (/bioSign)There is a Ctrl-Alt-Del hardware button to log in (since you might not have a keyboard)When you log in it pops up a virtual keyboard for you to enter your password. This is WEAK, and you should have to do a password signature biometric.Could probably scan your password off pecking on a virtual keyboard.Supposedly the Tablet SDK has some managed controls to integrate Ink into your .NET apps.Also, the OS is Win XP Tablet Edition with .NET already on it,so if you want to release a .NET WinForm app ... that is a perfect target platform.Be observant of RAM and CPU though.Also, I think this is where eBooks and ClearType will finally take off.In slate mode it should make for a very satisfying reading device, with pen annotation.Hate reading at my desktop, and PPC screens are small.Fast enough CPU to do Text To Speech, unlike MS Reader on the PPC(/gutenberg)Some of my other ideas are ... with wireless and .NET support ...I am thinking .NET controls hosted in IE, with Ink aware controls.Didnt see how Ink interacts with regular web pages and IE?Also, think Speech .NET (/noHands) Especially for the Slate only models.Cant think of a reason why Speech .NET IE extensions would not work on Tablet PC
Of course these notes were taken on my Pocket PC and then I had to manually transfer them to text ... perfect scenario for a Tablet PC :)

Tablet PC SDK

This is a nice SDK. It has good documentation covering the object model and new design aspects. 2 VS.NET controls for working with Ink. A managed code interface as well as an automation layer for older stuff. Lots of samples, many in multiple languages including C++, VB, C#, and VB.NET. 9 in C#, so that made me happy. And you dont actually need a Tablet PC for developing Tablet PC apps. Its basically developing WinForms, but taking a pen interface into consideration. So you can do this on your WinForm development environment and then test it on a Tablet later. e.g. WinXP Pro with a mouse acting as the pen. Alternately you can get a Wacom tablet (they provide the digitizers for alot of the Tablet PCs), and get a driver from them to work with the SDK. I picked up the smallest Intuos2 to test out this app with. It detects over 1024 different pressure levels as well as the tilt of the pen, which could be used to make an even stronger signature biometric. That sounds great, but you cant do everything with just the SDK. You can collect Ink, manipulate, serialize, etc ... but you cannot do recognition. Recognition being you cannot lay down ink that spells a word and then have it recognize that text and fill in some form. But keep reading ...

Windows XP Tablet PC Edition

Was thumbing through MSDN and I come across a DVD that says Win XP Tablet PC Edition on it. Throw it in the drive and it has 2 CDs. The 1st CD looks just like Win XP Pro. The 2nd has a couple CAB files on it, 1 for Media Center Edition and 1 for Tablet PC Edition. Immediately try doing an 'upgrade' install to my notebook with Win XP Pro and SP1 already on it. It asks for an activation key, which you can get from the Product ID manager for MSDN Subscribers online. It did an upgrade install, asked for the 2nd CD to install the Tablet components, and then asked for SP1 CD. The SP1 CD ended up being the 1st CD from the install after fumbling around looking for what it wanted and trying to create said CD. Restart and my notebook comes up with the 'Windows XP Tablet PC Edition' spash screen. Excellent! Crack open a C# sample for Ink recognition, draw 'Hello World' and it comes back with the same as text. Quickly look to see that it installed Windows Journal and such and then got to work. Actually ... I tried repeating this on my DEVBOX mentioned above, and this is what killed it ... i.e. dont try this at home. Regardless, my notebook is now a really big Tablet PC with an external USB digitizer pad

WindowsMedia Player (Series 9)

Crack open VS.NET and go looking for a WindowsMedia Player control. Already have the older MediaPlayer control (msdxm.ocx) added to my Toolbar. Drag that onto a WinForm, tie a video file to it, and then drag the InkPicture control over it. Run the WinForm and I am inking over the video! This stuff is way too easy. I did have to spend a little time to keep the ink rendered above the video with some efficiency (minus some bulletproofing), meaning if the WinForm was resized or moved then the ink was lost

this shows ink overlaid on a video with the old media player control.
for some reason it will not capture the video from a screen capture

Pushing the limits, I go and get the Beta WindowsMedia Player 9 SDK (already have WindowsMedia Player 9 Release Candidate). Install that and thumb through the docs. Ends up it has a C# sample in which the control is not visible but it uses its object model. Also, the docs show how to setup the environment to add the new control (wmp.ocx) to a VS.NET project. You have to regasm and gacutil a Primary Interop Assembly that comes with the SDK, then add the control to the Toolbox. Created another form and followed the steps above. Ran that form and the video does not render through the InkPicture control :( H3ll!

this shows a small ink area (gray box) overlaid on the new media player control.
see how this one does screen capture the video but is not visible through the ink box

Mess with it a little while until I give up and rolled back to the old control ... check out the AboutDialog that it displays:

this has got the be the only thing that survived Y2K

Digital Ink

The main objects for collecting ink consist of InkCollector and InkOverlay. There are some differences between the two, such as the InkOverlay supporting Cut/Paste operations. These are not controls you can drag and drop on the screen, but they can be bound to regular WinForm controls to let ink be collected on those controls and even do recognition and such. Below, I bind the InkCollector to a GroupBox to make an inkable area for Gestures. There are 2 controls that do render in VS.NET design view as well, InkEdit and InkPicture. InkEdit is a TextArea that lets you do in-place ink and have that ink recognized as text and then displayed immediately within its own textBox. The InkPicture will take an image as a property and then allow you to annotate that image. Dont believe that it supports recognition, so is only good for ink collection. In this example I use the InkPicture box overlaid on the WindowsMedia Player. It does not have a Picture set and its background color is set to Transparent to let the video play through. The InkPicture has an associated InkCollector. DefaultDrawingAttributes, and a Strokes collection, etc... Based on events of the Form, such as Moving and Resizing, then I have to force the strokes to be redrawn. DefaultDrawingAttributes is used to set the size and color of the ink that is collected.Also, during Resizing the Renderer object is used to scale the ink strokes and pen size to the new size of the Form. Finally, the controls CollectionMode can be set to Ink/Select/Delete. In Delete mode it renders the eraser as the mouse pointer and lets you delete individual controls based on clicking on that stroke

shows color and size differences. the video does not screen capture but was playing.
my main test video was christina aguilera's dirrty video ... so i tested frequently


Got JIT-learning about gestures ... err, umm ... so I'm a bit of an early adopter (understatement), and I hope to eventually be an innovator. When some near-future movie comes out, everybody always wants me to see it to get my opinion about what will and wont happent OR what is likely to happen 1st. This has been happening repeatedly with the movie Minority Report. Have not seen it cause that would entail me getting a date and would take away from development time, not to mention I cant get a date because I'm always developing .. its a vicious cycle. But supposedly it has a sweet gesture interface, so I will have to catch it. Back to my reality, I was skimming the Tablet SDK CHM file and Gestures kept coming up. The next day, I gave a little presentation to the CMU Speech group and afterwards got to see demos of stuff their grad students were working on. One of them was an in-car radio and navigation system controlled by using gestures. Previously the only in-car gestures I could think of were the finger and waving. Regardless, you can do gestures in Ink, and there are 2 flavors: System and Application-specific. System gestures are a tap is a click, a double tap is a double click and stuff like that. Every application gets those for free, and there are about 10 of them. Application specific gestures are what you can add to an app to make usability more natural with a pen. A good example is repeatedly making back and forth strokes over an area to do a mark out. You better believe heavy use of gestures will make there way into Microsoft Word and other text editors. With that in mind, the Tablet SDK defines about 40 app-specific gestures that are already recognized by their Ink Recognizers. For some of these they define expected behavior so that the same gestures will be consistent across applications. They also have laid out about 100 other gestures that they plan to support in the future. Currenty, if the 40 gestures are not what you need, then you can implement your own Recognizer and create your own gestures. In this app I made a little GroupBox in the upper right of the Form that only collected Gestures. I tried to map those gestures to the basic controls of the MediaPlayer. ChevronRight is Play, UpDown and DownUp are Pause, Square is Stop, SemiCircleLeft is Mute On and SemiCircleRight is Mute Off (like you are turning a volume knob). All you do is draw one of these symbols in the gesture box, it recognizes the gesture and then performs the associated action on the MediaPlayer. NOTE the square is hard to draw on an external digitizer pad, but would be easier to draw on a Tablet PC because the digitizer and display are the same

app specific gestures for play, pause (2), stop, mute on, mute off

gesture.wmv - video of gestures being used to control MediaPlayer


One of the samples shows how Ink can be serialized to file. It serializes to ISF (Ink Serialized Format) which is binary, XML with Base64 encoded ISF, as well an HTML file that refers to a GIF image depicting the ink. The 1st 2 (ISF and XML) can be deserialized as well. Thought about implementing that feature in this app to let people annotate a video, and then save off that annotation to a similarly named file. Then they could later open the video with this app, it would open the associated annotation file if it existed and then re-ink the video accordingly. The same way MS Reader does for the Pocket PC. Decided not to out of laziness, but here is some sample files of serialization from the sample app

ISF / XML / HTM+GIF - serialized formats


Early on when Tablet PCs were just an idea, Speech input seemed to be talked about as much as Ink. Now that they are here, it is hard to get any info about Speech. It all went down hill after the announcement that they would not initially support the Spanish language. Was really amazed by that one. In the SDK, as well as the newsgroups, there is very little mention of speech whatsoever. Regardless, the Tablet PC does have built-in speech capabilities through the virtual keyboard to support dictation as well as command and control scenarios. It has accessibility support as well, but I dont really understand any of this yet. To integrate Speech directly into our apps we can use the Beta 2 of Speech .NET for the web and SAPI Speech SDK 5.1 for fat client applications

virtual keyboard showing speech controls on top

Speech .NET

Speech .NET Beta 2 docs have a one-liner about being updated to work with Tablet PCs. Have no clue why any updates were necessary, since Tablet PC Edition is a superset of Windows XP? When Speech .NET is released, there will be an Internet Explorer Speech add-in install to support SALT, the XML-languae syntax for speech-enabling the web. This is all detailed in my previous Speech .NET series of articles: /noHands. Just now getting a chance to look at Beta 2, and my early impression is good. MultiModal web apps make alot of sense on Tablet devices, especially the slate styles that do not have keyboards

SAPI Speech SDK 5.1

Was hoping that the Tablet SDK was going to come with some new Speech SDK with managed code and such. Not so lucky, and to develop apps with speech cababilities we have to stick with SAPI, currently Speech SDK 5.1. To use SAPI with .NET you have to generace a RCW, which is easily done in VS .NET. The Speech SDK comes with 2 samples in C#. Once is a ListBox control that will recognize words that are contained in its list, and the other is a TTS sample that can record to WAV file or speak out loud. Heard a rumor that there is a Speech SDK 6 in limited beta testing ... My guess is that the Speech engines and such are new for the Tablet except the new APIs are not ready for developers yet. I'm advanced with Speech .NET, but a pure novice in SAPI, and waiting to really dig into it once managed libs come, but the following shows some of my attempts at speech enabling the client above. NOTE: there is cr4p for books about SAPI

Closed Captioning

1st attempt (failed) was to automate closed captioning of the audio from the videos being played. Wrote a couple lines of code to make the app start listening to do dictation, and then tried to muck with my volume controls to have it go against the sound being played by the video instead of a microphone. So that if the video was of somebody talking, then there words would be recognized by the SAPI dictation engine and presented somehow on the application to be saved to file or something. Once this was done once, then it could be played again and do a word by word step through of the transcript, or click on a word and go to that place in the video, etc... The tricky part was adjusting the volume recording and playback properties correctly, to mute and select the appropriate devices. Got it to work a couple of times, but it would only work for a short period of time (~10 seconds) and then it would get in some bad state and I would have to start over. When it did work, the speech recognition results were pretty bad. Out of my league, so I gave up on this real quick


This was just for kicks, instead of Text-To-Speech it is Ink-To-Text-To-Speech (ITTTS). It could be useful by using the audio as a way to verify that the recognizer correctly translated the Ink to Text without making the user read what they just wrote. All I did was create another Form, and add an InkEdit control to it. Then, on the Recognition event for the control, I have it get the recognized text and use SAPI's TTS engine to read it out loud. Only a couple lines of code. Now if I could just come up with a use and an acronym for Text-to-Ink-To-Speech :)

inkSpeech.wmv - video of ink being converted to text and then speech

Command & Control

This example is a more real-world scenario. Pens are great, but they require alot of movement compared to a mouse and keyboard interface, so they slow you down just because you have to move your whole arm the length and width of the screen. Speech is great for solving this problem. Instead of dictation, which does not work that well without alot of training or in harsh environments, command and control is much more accurate. Command and control in that there is only a limited amount of words that the speech recognizer is listening for. In the case of this app it just listens for the commands for Inking as well as the commands for the MediaPlayer. All I did was translate the code from a VB sample in the SDK, extend the grammar, and then tie the recognition of those phrases to buttons pushes on the Form. Now instead of using the pen to go and hit a button, they can keep inking and just speak the command which will improve their productivity

gram.xml - SAPI XML grammar file

commands.wmv - video of speech and ink in action


~3 days of code. NOTE: gesture recognition might only work on actual Tablet PCs. The Tablet SDK might need to be installed as well

TStrator.cs / MyInkTag.cs / TtsForm.cs


Alot of features could be added to this to make it more useful. Have some more Tablet PC ideas but will hold off til I actually get my hands on one because usage and testing on actual devices always brings up ideas and issues. Pocket PC Phone Edition is next on my purchase list though, for the CF .NET Everett release, and then DEVBOX upgrades. I am moving before Thanksgiving, at which point bNb will be offline for an indefinite period of time (days, weeks, or months?). This will be my last article before then. Want to do another biometric article. Also, I've just begun an article which will likely be posted at LearnMobile.net sometime after the move. Later