/tabletReco

comment(s) 

Tablet PC Digital Ink: Advanced Recognition and Custom Recognizer

http://www.brains-N-brawn.com/tabletReco 12/13/2002 casey chesnut

"Save the trees, Kill paper ... except toilet paper" quoth kc# circa 2K2

This article outlines Advanced Ink Recognition and shows how to implement a Custom Recognizer DLL

I am slowly going paperless: email replaced letter writing in college, web replaced newspapers/magazines/etc... online checking and debit card replaced checkbook, plastic grocery bags please, pocket pc begrudgingly replaced post-it notes, ebooks did jack squat. Looking around, the paper I still have left is cash (for strippers), receipts (for taxes), yellow engineer notepads and lots and lots of books. I said LOTS and LOTS of books. That is where the Tablet PC comes in. Stacks of yellow engineer notepads have all my brainstorming, analysis, design, pseudo code, and random ideas for the last couple years. They are sometimes fun to flip through to jog my brain; except when I am actually tearing through them looking for something important. Now just imagine if they were digital ink. Then I could just type in a keyword and it would return all the relevant notes to me. This is what Windows Journal (aka the Tablet PC Killer App) can do. Windows Journal does this by storing the ink as ink, as well as recognizing the ink and storing it as text. This gives you the best of both worlds: you can see the notes as you wrote them and also can search your notes

Recognizers

Handwriting recognition is not new. It has become commonplace in PDA and Pocket PC devices. What is new, is the convergence of years of MS research and dropping hardware costs allowing for devices to provide for really accurate results; without training. Serious. You can just start chicken scratching away and the Tablet PC will do a surprisingly good job of understanding what you wrote. This is accomplished through what are called Recognizers. Recognizers are C-Style DLLs that the Tablet Runtime calls to recognize ink. My english language version of Win XP Tablet PC Edition installed 3 recognizers (C:\Program Files\Common Files\Microsoft Shared\Ink):

These are also added to the registry (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\TPG\System Recognizers). The following shows the 3 System Recognizers and my Custom Recognizer settings

This should be of interest to developers. As explained earlier in my /tabletStrator article, you can develop Tablet PC applications without a Tablet PC. The problem is that you do not get the Recognizers unless you are running the Tablet PC Edition OS. So you can collect and manage ink, but you cannot get it recognized to text. Now the Tablet PC Edition OS does come with an MSDN Universal subscription, and I describe how to install it on a regular PC (developed this on a notebook running Tablet PC Edition). But it might be possible just to copy the DLLs listed above from an installed version of Tablet PC Edition OS to a Win XP Pro machine with the Tablet SDK installed, and then set the appropriate registry settings (right click - save target as). I have not tried this, but it seems feasible

Ink Architecture

This image was pilfered straight from the Tablet PC SDK documentation

A layered architecture ... go figure. The top layer is the app layer. The Tablet PC SDK provides managed and automation APIs to developers so you can make your own lifestyle choice as MS puts it ... how politically correct. If I ever refer to anything I do as a lifestyle choice ... remind me to beat myself up. All the samples (except 1) and books deal with working with ink at this layer. The middle layer is the Tablet Runtime. Written by MS, it is just a black box. The bottom layer is where the Recognizer DLLs mentioned above fall. Regarding ink recognition, you can tweak the results by making certain calls at the app layer or you can be a masochist and write your own custom reco DLL at the bottom layer. For this article, I will 1st talk a little about advanced ink recognition at the app layer using the Tablet SDK and then show in detail how to write a custom recognizer DLL at the bottom layer

Advanced Recognition

The Tablet SDK allows for applications to customize the ink recognition that occurs. First, an InkCollector can choose to recognize Ink as Text, Ink as Text and Gestures, or Ink as Gestures only by setting its CollectionMode setting. For the gesture recognizer, it lets you specify which gestures to look for and which to ignore. Next, the default text recognizers (in this case English) can recognize inidividual characters or words. The English recognizer uses a wordlist to help with recognition, while more symbolic languages might be character based. To support jargon of specific industries (e.g. Medical Shorthand) a wordlist can be added to. For command and control situtations, where only certain words or characters make sense, the wordlist can be replaced and and the RecognitionContext set to Coerce. Then, in even more specific instances, Factiods can be set. Factoids give a hint to the Recognizer of what the ink will be. Meaning you can set a factiod on an ink aware text box called EMAIL, in which a user will write their email address. This hint will help the recognizer return better results, and you will have a better chance of the @ symbol being returned as @ instead of the letter 'a'. Also, you can choose to recognize Ink synchronously or asynch. This is done through the Recognize() and BackgroundRecognize() methods, along with hooking the Recognition event on a RecognitionContext object. Next, you can add a RecognizerGuide when collecting ink. This lets you specify boundaries and midlines, like lines on paper. e.g. to help the recognizer differentiate between a lower case 'c' as opposed to an upper case 'C' based on a designated height. Finally, you can choose for the Tablet Runtime to only return the TopString (best recognition result) or have it also return Alternates and a ConfidenceLevel. Alternates are the other possible matches for that ink. This allows for scenarios of letting a user select a recognized word, seeing a list of alternates, and then choosing one of the alternates to replace the selected word. I have played around with most of this and it works great. The only problem I ran into is when it came to case. Was trying to make an app to recognize single character alphabet characters, both upper and lower case. When I set a factiod to look for UPPER_CHAR it would accept lower case characters and coerce them to upper case characters. My expected results were that it would recognize 'A' and not 'a'. Instead it took 'a' and forced it to 'A'. This might be a bug, else I just read the documentation wrong. This and other code is in the /tabletInk article. Also, there seems to be some enumerations defined in the API that are not implemented by any of the Recognizers which is slightly annoying. The MS Press Tablet PC book has a great chapter and numerous sample code (C#) to show off most of these features

Custom Recognizer DLL

For 95% of ink-enabled applications, tweaking the recognition settings outlined above will produce more than adequate results. The other 5% of apps (+ or - 1%) might be better suited with a custom recognizer. That custom recognizer would most likely be used to recognize symbols and drawn objects. Text is pretty much nailed by MS, so just reuse and tweak that. The gesture recognizer supports 40 odd gestures now that will become defacto standards across applications, as well as outlining 100 gestures they plan to support in the future. Scenarios that the above will not support might be industry specific symbols (e.g. math and music symbols) and drawings (e.g. UML diagrams ... think ink-enabled Visio 2003?). For the rest of this article I will write a simple custom recognizer DLL to recognize written morse code (dot and dash symbols). NOTE the gesture recognizer supports recognizing a dot and horizontal lines, so this could have been written simply by customizing gesture recognition; although that would be no fun. To get started, the Tablet SDK comes with some thin documentation on how to write a custom reco DLL. It also comes with a C++ sample (RecoDLL) that provides a basic wrapper of the interfaces

RecoDll Sample

The rest of the SDK is language agnostic; you can code the ink app layer in VB, VB.NET, C#, C++, but when it comes to a Custom Recognizer DLL you must use some C++. I find C++ extremely powerful but with an overly polluted syntax from legacy typedefs and keywords. So I extended the sample to just be a wrapper and marshal calls between the Tablet Runtime and a C# library. The 1st thing to do was change the project settings to support the Managed C++ Extensions for .NET (Configuration Properties-General-Use Managed Extensions = Yes). Also add #using <mscorlib.dll> and #using <system.dll> to the cpp file. The project uses a .def file to export the C-Style functions. Comparing it to the system DLLs using dumpbin /exports <file.dll>, I came up with this spreadsheet. It shows that the differences in implementations between the different DLLs. Since I wanted to see everything, I went to the RecApis.h file and declared the rest of those functions in RecoDll.cpp as well as the .def file so that they are exported (something like 40 functions in all). NOTE TwisterReco in the gesture DLL ... I dont think that one was supposed to be exported? Then, only 3 of the functions are implemented, the rest just return E_NOTIMPL; To get some debug info I added a Trace line to every funtion: System::Diagnostics::Debug::WriteLine("AddStrokes"); NOTE 'const' on an AddStrokes parameter, just remove that to make your life easier

Registry

2 of the implemented functions were for registering and unregistering the DLL. These functions would be called when your app was installed. Since I was not going to the trouble of making an installation, I made a quick WinForm app to make these calls directly. Had to use PInvoke to call these C-Style functions. NOTE I built the recoDll.dll to my apps bin directory so that it could be found

using System.Runtime.InteropServices;

[DllImport("RecoDll.dll")]
public static extern int DllRegisterServer();

[DllImport("RecoDll.dll")]
public static extern int DllUnregisterServer();

private void regBut_Click(object sender, System.EventArgs e)
{
	try
	{
		int hresult = DllRegisterServer();
		if(hresult == 0)
			AppendLine("registered");
	}
	catch(Exception ex)
	{
		MessageBox.Show(ex.ToString());
	}
}

private void unregBut_Click(object sender, System.EventArgs e)
{
	try
	{
		int hresult = DllUnregisterServer();
		if(hresult == 0)
			AppendLine("unregistered");
	}
	catch(Exception ex)
	{
		MessageBox.Show(ex.ToString());
	}
}

NOTE DllRegisterServer() calls the 3rd functions that was already implemented: GetRecoAttributes(). It calls GetRecoAttributes to get the bit flag value of the RecognizerCapabilities enumeration. This is set to 0 by default ... but more on this later. For languages, a single Recognizer DLL could support multiple languages, although MS has them divided up by language. The Gesture Recognizer is language independent so it is 0'd out. Now to test that my recognizer was there, I then had the app iterate through the available Recognizers and dump out their name property

Recognizers rs = new Recognizers();
foreach(Recognizer r in rs)
{
	AppendLine(r.Name + " | " + r.Vendor);
}

This failed on my recognizer saying get_Name() was not implemented. Reviewing the Trace info that was being dumped to the Output window from RecoDll.cpp, I was able to see that CreateRecognizer() was being called. Changed its return from E_NOTIMPL to S_OK, and it dumped out my recognizers name like so

Murder, Ink. ... I couldn't resist :) So it is alive. The app layer made a call to the Tablet Runtime which made a call to my custom recognizer, which was then marshalled back to the app layer through the Runtime. Cool!

Cross-Language

So that was all C++, which is still better than VB .NET. Now the real trick was to get to C#. Made another project, RecoDllLib as a C# class library. This library sits below the RecoDll.dll and implements the actual logic, while RecoDll.dll just marshals data between the C# lib and the Tablet Runtime. Addded #using <recoDllLib.dll> to RecoDll.cpp and built recoDllLib.dll to the same directory so that it could see it. Had already written C# code to read and write the registry, so I left that alone. That left GetRecoAttributes() as still working, and it was marshalling a structure around, so I decided to tackle that with C# instead. In the Tablet SDK, the RECO_ATTRS structure is only defined in C++, so I had to define it in C#. Set some values on it and had the C++ DLL call it by running the app again ... nothing. Checked the event viewer ... nothing. Looked for any sort of info dump ... nothing. Tried hooking the ink process so that I could debug ... nothing. Could not find anyway to put the Tablet Runtime into some sort of debug mode to give me some info. This really sucks! MS needs to provide some way for Custom Reco developers to get some sort of debug info back. Right now the Tablet Runtime is entirely a black box, so when you make your call from the app layer you can Trace out some info in your DLL, but if something breaks on the way back ... you have no way of knowing what happened. UGLY! This is when I got creative. Decided to just make the C# Lib PInvoke the GetRecoAttributes() method on the MS implemented text recognizer, and then I could get those results back to look at

[DllImport("mshwusa.dll")] 
public static extern int GetRecoAttributes(IntPtr hrec, IntPtr pRecoAttrs); 

public int LibGetRecoAttributes(IntPtr hrec, IntPtr pRecoAttrs) 
{ 
	return GetRecoAttributes(hrec, pRecoAttrs);
}

Believe it or not ... it worked. My app called the Tablet Runtime, which called the C++ RecoDll.dll, which called the C# RecoDllLib.dll, which called the C++ mshwusa.dll and then it all returned back to the app. Out of pure madness, tried to do this for an entire recognition scenario and got a Pointer error at AddStrokes(). With a little more effort it might be possible to get it to work all the way through a full recognition cycle. Since its not practical, other then for helping developers, I tabled that effort. Regardless, I did a bunch of sizeof() operations until I got the structure just right ... and it worked

//C++ RecoDll.dll
HRESULT WINAPI GetRecoAttributes(HRECOGNIZER hrec, RECO_ATTRS* pRecoAttrs)
{
	recoDllLib::MyRecoLib* mrl = new recoDllLib::MyRecoLib();
	mrl->WriteTrace("GetRecoAttributes");

	int retVal = mrl->LibGetRecoAttributes(hrec, pRecoAttrs);
	return retVal;
}

//C# RecoDllLib.dll
[StructLayout(LayoutKind.Sequential)]
public class RECO_ATTRS
{
	public int dwRecoCapabilityFlags;
	[MarshalAs(UnmanagedType.ByValArray, SizeConst=64)]
	public char [] awcVendorName;
	[MarshalAs(UnmanagedType.ByValArray, SizeConst=128)]
	public char [] awcFriendlyName;
	[MarshalAs(UnmanagedType.ByValArray, SizeConst=64)]
	public ushort [] awLanguageId;
}

public int LibGetRecoAttributes(IntPtr hrec, IntPtr pRecoAttrs)
{
	int retVal = 1;
	try
	{
		this.WriteTrace("LibGetRecoAttributes");
		RECO_ATTRS ra = new RECO_ATTRS();
		ra.dwRecoCapabilityFlags = (int) RecognizerCapabilities.Object;

		ra.awcVendorName = new char[64];
		string vendor = "brains-N-brawn.com";
		ra.awcVendorName = Util.StringIntoCharArray(vendor, ra.awcVendorName);
		ra.awcFriendlyName = new char[128];
		string friendly = "Murder, Ink. Recognizer";
		ra.awcFriendlyName = Util.StringIntoCharArray(friendly, ra.awcFriendlyName);

		ra.awLanguageId = new ushort[64];
		ra.awLanguageId[0] = 0;
		int sra = Marshal.SizeOf(ra);
		this.WriteTrace(sra.ToString());
		Marshal.StructureToPtr(ra, pRecoAttrs, true); //false could be a mem leak?
		retVal = 0;
	}
	catch(Exception ex)
	{
		WriteTrace(ex.ToString());
	}
	return retVal;
}

Implementation

Revisited the app layer to do more than reflect against the recognizer for its name and vendor properties. Coded it to do a full recognition with sample code available from the SDK documentation. With that code, the Trace dump from the custom recognizer showed what methods were getting called and in what order: CreateRecognizer(), CreateContext(), AddStroke() N # of times, Process(), GetBestResultString(). So now I knew what methods to implement next in the C# RecoDllLib. For CreateRecognizer() and CreateContext(), I just returned 0 or S_OK. For AddStroke, all I needed to know was if the stroke was a dot or a dash. Did this by looking at the 'cbPacket' parameter which is the # of packet arrays returned for a stroke. Thus if the array size was small, then the stroke was a dot; and if it is large, then the stroke was a dash. NOTE the size is different based on if the mouse was used for inking or if an external tablet was used. Saved that info off to a static member called MyStrokes. For Process() it returns a bool of whether it is finished or not, so I returned TRUE. Finally, for GetBestResultString(), I converted the morse code dots and dashes to text, and then marshalled the string back. Doing this made one more method get called: GetLatticePtr(). Tried to just set that to S_OK. So now I had the ink recognized and I just needed to marshal it back to the app layer, through the black box of the Tablet Runtime

Lattice

At the app layer, if you have an instance of a RecognitionContext with strokes, then you can just call recoResult = recoContext.Recognize(out recoStatus); The out parameter just tells if there was a success or error on recognition. The return value is a RecognitionResult. The RecognitionResult has a TopString property which is the most likely ink reco result from the recognizer DLL. My guess is that TopString maps directly to GetBestResultString() of the CustomRecognizer. RecognitionResult can also contain a list of Alternates. Described above, this has other result strings and their corresponding confidence level. My guess is that Alternates map directly to GetLatticePtr() of the recognizer DLL. Basically 2 methods for returning output, and all I wanted was the 1st. Sadly, my RecognitionResult object was always null. So I had no way of calling for the TopString. To get around this I tried to implement a simple Lattice and marshal it from C# to C++, the same way I marshalled the RECO_ATTRS structure. Unfortunately, the RECO_LATTICE structure is much more complicated. Here is the way I defined it in C#

[StructLayout(LayoutKind.Sequential)]
public class RECO_LATTICE
{
	public uint ulColumnCount; //ULONG
	public RECO_LATTICE_COLUMN [] pLatticeColumns; //RECO_LATTICE_COLUMN*
	public uint ulPropertyCount; //ULONG
	public GUID [] pGuidProperties; //GUID*
	public uint ulBestResultColumnCount; //ULONG
	public uint [] pulBestResultColumns; //ULONG*
	public uint [] pulBestResultIndexes; //ULONG*
}

[StructLayout(LayoutKind.Sequential)]
public class RECO_LATTICE_COLUMN
{
	public uint key; //ULONG
	public RECO_LATTICE_PROPERTIES cpProp; //RECO_LATTICE_PROPERTIES
	public uint cStrokes; //ULONG
	public uint [] pStrokes; //ULONG*
	public uint cLatticeElements; //ULONG
	public RECO_LATTICE_ELEMENT [] pLatticeElements; //RECO_LATTICE_ELEMENT*
}

[StructLayout(LayoutKind.Sequential)]
public class RECO_LATTICE_ELEMENT
{
	public int score; //RECO_SCORE
	public ushort type; //WORD
	//public byte [] pData; //BYTE*
	public string pData; //BYTE*
	public uint ulNextColumn; //ULONG
	public uint ulStrokeNumber; //ULONG
	public RECO_LATTICE_PROPERTIES epProp; //RECO_LATTICE_PROPERTIES
}

[StructLayout(LayoutKind.Sequential)]
public class RECO_LATTICE_PROPERTIES
{
	public uint cProperties; //ULONG
	// [size_is][unique]
	public RECO_LATTICE_PROPERTY [] apProps; //RECO_LATTICE_PROPERTY**
}

[StructLayout(LayoutKind.Sequential)]
public class RECO_LATTICE_PROPERTY
{
	public GUID guidProperty; //GUID
	public ushort cbPropertyValue; //USHORT
	// [size_is][unique] 
	public byte [] pPropertyValue; //BYTE*
}

From the docs, the simplest one you can build is a RECO_LATTICE containing a RECO_LATTICE_COLUMN containing a RECO_LATTICE_ELEMENT. Did all that in the C# lib and then marshalled it to the C++ lib to be returned to the Tablet Runtime. You guessed it, the RecognitionResult was null. Then tried to instantiate one in C++ to pass to the Tablet Runtime without the extra C# hop ... null

Bypass

At this point I had no other options but to try and avoid the RecognitionResult object and the GetLatticePtr() call. Went to the app layer 1st to see if a different calling sequence would avoid it. If you hook the Recognition event on a RecognitionContext you can subscribe to an event with or without Alternates. Obviously I implemented the delegate without Alternates in hopes that only GetBestResultString was called. Ends up GetLatticePtr() was called and the Recognition event was not fired. At least looking at the properties for the event arguments, Alternates are not accessible. That did not work, so then I tried another look at the Custom Reco DLL. Searching around the docs I found the RecognizerCapabilities enum with Lattice defined as 2048. Remembering the MS implemented DLLs, their capabilities were in the >2048 range and they did export and support GetLatticePtr(). Great, my DLLs capability was currently set at 0 (undefined), so I set it to a valid range and re-registered it. Although its capability was now set so that it did not support Lattices, GetLatticePtr() was still called and I could not access TopString. This might be a bug? Come to think of it, I dont know if I tried hard enough with GetLatticePtr() returning E_NOTIMPL, and I know I definitely did not try it without that method not being exported at all ... because it was not originally defined in the RecoDll sample at all (although from the docs, it looks like it is required)

Hack

So I cheated. Had worked really hard to get this far; reading a .NET C++, Managed C++, and an Interop book in parallel for the stuff I didnt know that much about. Had successfully returned the RecoCapabilities and everything else was working as expected ... but I had run out of ideas on how to get the recognized text to be returned up through the Tablet Runtime to the app. So I skipped the Tablet Runtime. Exported another function on the Recognizer DLL to let the app PInvoke it to get the result string :) The result being I can now ink out morse code on the app and have my C++/C# Custom Recognizer DLL return the corresponding letter. Here is a pic of the obligatory SOS. It's the Hello World of the Morse Code world. Start tapping it if you fall in a well ... unless you were pushed, and then play dead until they walk off

recoDll.wmv | sos.wmv - videos of it in use

Conclusion

The MS Press Tablet PC book has a 1-liner about Custom Recognizers, something to the effect of 'its difficult and requires specialized skills'. Difficult ... definitely. Specialized skills ... if you are recognizing more than dots vs dashes, oh yeah. At least for most cases, it is likely that a Custom Recognizer DLL is not really needed in the 1st place and tweaking the recognition from the app layer with the Tablet SDK will be sufficient. Hope that MS will provide a fully implemented Custom Recognizer DLL sample (preferrably for gestures), along with some way to get debug info from the Tablet Runtime 

Future

That is 3 Tablet PC articles in a row ... and I still do not own one :( Only got a couple cool articles ideas left for the TPC, but those will have to wait. The Web Service Extensions (WSE) were just released, so I have to reprove my dominance in that domain. Not to mention Speech .NET could stand a revisit for its Beta 2, and the latest CF .NET bits, etc... Later