Web Cam Optical Flow on a Tablet

http://www.brains-N-brawn.com/cameraFlow 8/15/2005 casey chesnut


i have a fetish for non standard user input devices (dont take that out of context). so when i saw that somebody had created a Camera Driven Table Tennis for Mobile Phones, i got really excited. it works by using the camera of the phone to control the table tennis paddle on the screen. this is done by having the camera track some significant feature in each captured image, and then panning accordingly. thought that was really cool and wanted to try something similar, so this article will attempt to create a labyrinth game running on a Tablet PC with an attached web cam for tilt control.


Optical Flow

the main problems are how to acquire and track significant features of an image (i.e. image processing). my first idea was that motion detection could handle it. this was my first guess because i had already done a little motion detection for the /aiGesture (Gesture Recognition) article. also, i had seen this sweet article on codeproject (Motion Detection Algorithms), which uses the best image processing library i've seen for C#.  but a little more thought showed that the problems were different. motion detection usually involves a stable camera with a dynamic world environment. the camera does not move, but elements of the image do. while this new problem would involve the camera moving while the world environment remains static. e.g. similar to how an optical mouse works. NOTE this is sometimes referred to as Optical Flow.

the next place i looked was Sluggish Software. this guy is constantly amazing me with the machine vision projects he cranks out. he's got a number or projects including face recognition, gesture recognition, etc ... none of them mapped directly, but his site did remind me of Robotics4.net. it links a number of libraries useful for building robots, including SharperCV. SharperCV is a C# wrapper (source code not provided) that wraps Intels Open Source Computer Vision Library. SharperCVs web site was not all that promising as it explained that the project had been ended due to the difficulties involved with wrapping a flat C style native API with an OO managed wrapper. the authors recommend to port the code to C# as its needed. nor had i ever used Intels OpenCV, to really think about a port, so this was looking like a dead end. for kicks, i went ahead and installed SharperCV and tried its samples. by dumb luck, the SampleLKTracker was exactly what i was looking for. from a web cam feed, it finds candidate features to track. when the user presses the spacebar it then locks on those features. as the web cam is tilted, it continues to track those features. yes! thus i did end up using the SharperCV wrapper over OpenCV. NOTE the SharperCV effort was funded by Microsoft Research


behind the scenes, the LKTracker sample is using the Lucas-Kanade image processing technique. if you search for it you'll see that its also used to stitch together panoramic images. it finds the same significant feature in neighboring images, and then that feature can be used to determine where the images overlap. for acquiring features to track, the CvImage class has a method called GoodFeaturesToTrack() ... how convenient is that. if i remember correctly, i think this method is using eigenvalues behind the scenes?


now that i had the web cam input and image processing pieces, it was just a matter of incorporating it into a WinForms application. the goal is for the web cam to track a single point. when you start the app, it automatically acquires that feature, and then starts tracking it. as you tilt the web cam, it reports back the x and y coordinate changes from the previous recorded point. it does this through an eventing model. it also handles reacquiring a feature in case the feature you were tracking goes out of view from the web cam. NOTE this has the same problems as optical mice, in that it will not work in all environments. e.g. if you pointed this at a nondescript white wall that was evenly lit, then it might have problems finding a significant feature.


the screen shot above shows an example of its tracking capabilities. the lines in the PictureBox were drawn with GDI+ using coordinates returned by tilting the web cam up, down, left, and right. the significant feature had to be reacquired about 3 times during this sequence, yet the lines are relatively straight. NOTE the tracking is pretty responsive too. on my computer, i'd estimate that i get 10 events a second.

Panoramic Images

now that we're getting user input through the web cam, we can start making interesting applications with it. previously, i mentioned how the LK algorithm can be used to create panoramic images. on the other side, that brings up the problem of how to view really large images, especially on mobile devices. typically, we have to scroll using our keypad, stylus, or mouse. well, what if you had an image viewing program on your smartphone, pocket pc, or slate tablet that scrolled a high resolution image as you tilted or panned the device? this would allow you to rapidly scan a very large image and focus in on the area you were interested in. of course, i'm a pervert, so i used it to bring back the centerfold experience


the screenshot above shows an app that can run on my slate tablet for viewing large high resolution images. the web cam is clipped to the side of the tablet and as you tilt or pan the device, it scrolls to the the corresponding part of the image. you could also use it to view maps, reports, and such ... zzz ... but think about this. you could stand in one area, and as you spun around in a circle, it could display an entire panoramic image.

you could also extend this paradigm to handle zooming in and out. this would work by acquiring 2 significant points in an image. then as you moved closer or further away from that scene, you could determine the delta between those 2 points and zoom accordingly. did not implement this because i most likely just saw myself tilting the device.

Wooden Labyrinth

and now we get to my real idea ... a game. not a gamer, but i used to be. one of the pre-video games i used to like was the wooden labyrinth. you use the dials on the side to maneuver a steel ball through the maze. the hard part is there are a number of holes in the board that the ball can fall through ... so it takes a little hand eye coordination. this game is just begging to be played on my slate tablet, and controlled by optical flow.


also, i was looking for an excuse to do some more Managed Direct3D ... the only other Direct3D programming i've done was for /cfWorldWind. that was just taking the existing Managed DirectX (MDX) codebase from NASA, and then porting the bare minimum amount of code to run on Managed Direct3D Mobile (MD3DM) on a PocketPC. so the real challenge there was the port. how were the interfaces different, and how to get it to run on a resource constrained device. thought about writing this on a handheld too, but didn't have a Windows Mobile 5 device with a camera. instead i chose to write it to run on a slate Tablet PC with an attached web cam. with this setup i could rely on OpenCV for image processing and use the full Managed DirectX. so this will be my 1st ever (written from scratch) application for Direct3D. it's also my first time to ever write a game ... although its probably more like a simulation. the initial challenges involved planning. just had not looked at enough MDX applications to have any sort of style or know any best practices. so this codebase is really just me shooting from the hip. it is quite likely that i'm doing some very idiotic things in the code base ... and right now, i don't know any better ... but you've been warned

first, i started out just trying to get some initial world to render. for the initial starting project i used one of the MSDN samples. this gave me the device creation, reset, and render loop. then i started to create my game world using VertexBuffers. this got old real quick. the main problem was that i wouldn't always choose the right VertexFormat, or the vertices would be in the wrong clockwise position, or the normals would be wrong, etc... so the scene would not be rendered either because i did lighting wrong, or my camera was in world space while the object was in screen space, or i messed up culling, etc... gave this up and decided to use a 3D modeler instead, and to create meshes to load into the world. this took more time than expected because i couldn't find a tool that did everything i wanted. the main problems were price, support for the .x format, and usability. Maya, Lightwave, and 3DS Max are too expensive. GMAX is free, but doesn't support .x. Blender is free and Milkshape is low cost (~$25) but i had problems using them. ended up using TrueSpace. its something like $500, so its still too expensive for what i need (i'm not a graphics guy), but i could figure out how to use it, it supports .x, and it has a free trial. i also really like TrueSpace because the app itself is 3D ... well done. in the future, if i have alot more modeling to do, then i'd definitely revisit this problem and try to look at the other low cost tools when i have more time. started out just creating a flat plane with some walls for the board and a sphere mesh for the steel ball. loaded the mesh into world coordinates, setup the camera and some lighting, and it rendered!


second step was to implement the physics for the world. er, um ... so i'd never tried to simulate physics before either. crap, i dont know anything! but my friends do. optionsScalper threw together a quick app to show bouncing. he and Marty were adamant that i not use fake physics. Chad threw together an app using this C# physics engine. Andy came up with some ideas for bugs i was having. ZMan (of The ZBuffer!) figured out why my collision detection wasn't working as expected. and Jason pointed me to his managed game engine. also looked at 2 different books. O'Reillys Physics for Game Developers and Apress' Physics for Game Programmers. didn't actually read the books, because i only needed a little out of each, but i did prefer the Apress book because the samples can be downloaded in C#, Java, and C++. the physics actually ended up being pretty easy. the first thing needed was rolling. as the board tilts, gravity kicks in and the ball starts to roll. over time its velocity increases. there is also the force of the board that slows down the ball as its rolling uphill. plus a little friction when the board is flat. the other process was bouncing, for when the ball hits a wall. all it does is redirect the velocity and slightly decrease it because of friction.NOTE i only handle bouncing off walls, and not the floor ... so you cannot move it real quick and jump over a hole :)

bouncing requires collision detection. i do this by using Mesh.Intersect. what you do is specify a ray (point of origin, and direction) using Vectors, and then apply that to the Mesh. to determine if the ball is hitting a wall, the rays origin is the position of the ball and its direction is the path that the ball is traveling. the ball is surrounded by walls, and the ray goes on infinitely, so it always reports that a collision is occurring unless the ball is not moving. thats not very helpful. what you have to do is then check a Distance property from the IntersectionInformation class. if the Distance is less than the radius of the ball, then a collision really has occurred. the IntersectionInformation class also returns the index of the face that it collided with. with this info and the mesh, you can generate a Plane and determine the normal of the face, if you dont know this already. i.e. so you can derive the direction you need to reflect off the wall from the mesh itself. collision detection is also required to tell when the ball has gone over a hole. this is done by extruding holes in the plane mesh used for the floor. so as the ball is traveling over the mesh, it has a ray pointing down from its current position. Mesh.Intersect() returns true when the ball is over the mesh, and false when it has rolled over a hole. when this is caught, then the app can determine that the game is over. anyway, this is far from the most efficient way to do collision detection, but i purposefully chose these methods for ease of understanding and because the vertex count is low. even though the main advantage is that alot of game data can be derived from the mesh level itself and not duplicated in application logic. this also allows me to create different levels without having to change any of the app code


with the physics working, i went on to create the levels. added in more cube meshes for the walls and subtracted spheres from the plane to make holes in the floor. the only problem the 3D modeling tool gave me here was with the plane. once i subtracted about 10 holes from the plane, then it seemed like the integrity of the mesh was broken. i could tell this because the preview in TrueSpace was all mucked up (see image below). it looked like the plane had folded back upon itself? i just assumed this was a problem with the preview, so i did a quick render, and it looked fine. but when i exported the plane to a .x mesh file and looked at it with the viewer, then it really was all messed up. to get around this i had to divide the plane up into 4 connected planes and make sure none of them had more than 10 holes in them.


now that the game boards were made, i had to create the support code. made a config screen that would let you tweak the difficulty levels and what control(s) you wanted to use. there are 3 game boards : easy, medium, and hard. friction is set higher to begin with to make it easier, if you slide it all the way to the left then friction is off ... hard. you can control it with the web cam, mouse, keyboard, or pen. the mouse is the easiest. if you want to use a pen on a tablet, you have to really turn down the sensitivity, because its in HIMETRIC units. DirectInput should handle this better ... if its not already deprecated? you can use the keyboard but its not recommended, because its hard to hit the arrow keys fast enough. when using the web cam on a tablet, then i typically turn the mouse off, because if the pen is hovering over the screen it will screw up your game. for testing, i used a web cam on my notebook and just moved it by hand. to make this easier you can invert the web cam controls, so when you angle the camera up, then the ball will go up. the checkboxes for viewing velocity and force draw lines on the screen that give you a real good idea of which way the ball is rolling and which way the board is tilted respectively. practice mode lets you roll over holes without ending the game. and the tilting of the board can be turned off as well if you always want to see a flat board. 


Shareware Starter Kit

entered this into the Code'n my way to PDC Contest. the requirements of the competition were to use VS 2005 Beta 2 and the Shareware Starter Kit (SSK). the SSK is pretty cool. it provides alot of source code (yes, i said source!) to support your shareware applications. the main functionality it includes is :

its available in C# and VB and comes with a sample client and web service. you can host the web service yourself or there will be 3rd party hosters in the future. that way you can just worry about coding the client.

never wanted this to be a pay product to begin with. so this project is freeware (not shareware), and i integrated the Shareware Starter Kit to do registration, feedback and exception reporting of the render loop. freeware needs that functionality too, just not payment and license processing. i'd like to see the SSK be used by the freeware market as well.

Mobile Devices

really like this on my slate tablet, but finding a web cam to attach was a pain. first, i tried the Creative Labs WebCam for Notebooks. its got a 14mm clip .. the problem is my tablet is about 15mm thick. to attach it i had to only go part way and clip it within the PMCIA slot. if i actually had a PMCIA card that i used, then this would not work. the other problem with this web cam is that i seemed to have driver problems. you could use it once, but the second time you tried to start it up, it would say it was already in use ... then i would have to reboot the computer to go again. needless to say, i returned this camera. the next candidate i found was the Creative WebCam Live Ultra for Notebooks. this clip also had the 14mm problem. the driver worked better on my notebook, but when i tried it on my tablet, it complained about the USB port not being full USB 2.0. my tablet is a couple years old 1st generation Fujitsu, and it is only USB 1.0, so the camera would not work at all. had to return it as well. finally, i got the Logitech QuickCam for Notebooks Deluxe. this had the 14 mm problem too, but it was able to run on my Tablet and the driver is pretty good. the only problem i've had is that sometimes the OpenCV lib cannot start it. to get around this i just go into some other program (e.g. Windows Movie Maker) to get a preview window, then i exit that program and restart the game.

in the future, i'd like to see manufacturers produce more cameras that can be clipped onto tablets. how about cameras that can attach to multiple bases or clips depending on where you want to use them. or just build a camera into the tablet. finally, some Tablet PCs have gyroscopes built into them. not sure how sensitive they are, if there is an API to get access to it, or if it provides enough events to be used in a game like this ... but its an idea.

next, i considered the Compact Framework v2 on a Windows Mobile 5 Pocket PC. these devices can have Managed Direct3D Mobile, VGA screens, built-in cameras, and even a graphics processor. the code i've written only uses the basics of Managed DirectX on the desktop, and i'm confident it could easily be ported to work on a Pocket PC. the only problem would be accessing the camera. specifically, CF provides a managed API to get at the camera, but its tied to the PicturePicker UI. they really should separate the low level classes from the Form UI, because i can envision many applications that want to use the camera with their own API ... and pInvoke sucks. the other thing to consider is performance of the optical flow. although image processing can be done much faster in CFv2 with the ability to pin a bitmap, there is a good chance that the image processing code would have to be written in native code to get adequate speed. if i did an initial port, it probably would not include camera support, and would just be controlled by the directional pad or stylus


the first video shows the game running on a Tablet with an attached web cam. if i tilt the tablet left (the camera moves right), then the game board tilts to the left and the marble rolls that way.

the second video shows a test app that demonstrates optical flow tracking. the red dot shows the feature that is being tracked. the panel to the left shows the x, y coordinates being translated and drawn using GDI+.

the third video shows the game running on my notebook controlled by a tethered web cam. the control is reversed in this scenario, so when the camera moves left then the board tilts to the left and the ball goes left.


this article showed how you can use Optical Flow in your programs to provide a different kind of user interface. as more mobile devices get cameras (and gyroscopes) built-in, this type of interface becomes more accessible. for the labyrinth game, it just makes sense. you can use the mouse, but it really makes the game too easy. the web cam control is more natural and really challenges your hand eye coordination to a greater degree.

funny story. i used to own the real world version of this game growing up. loved it. used to try and play it in the car ... didn't understand momentum at the time :) anyway, i went to go buy it from a local ToysRUs to see if i had the physics even sort of close. couldn't find it, so went and asked a sales person. this was the conversation. me "do you have the wooden labyrinth game?". him "i know what you're talking about, but if its not in the board games or the wooden section, then we dont have it". me "i already tried wal mart, any other idea where i could get it?" him [thinks for a moment] "how about a museum?". serious! that made me feel really really old.

now for technology. i'd like to see more image processing and computer vision libraries make their way to the .NET platform. also libraries for working with web cams and video processing. DirectShow had some of the basics brought over, but i'd like to see alot more. we know the need is there because of all the 3rd party wrappers that people have been creating. for Direct3D, i'd like MS to get better .x support out in some of the low cost modeling tools. either write the plugins or provide converters. ultimately i'd like there to be some sort of WYSIWYG designer for creating DirectX applications by placing cameras and dragging and dropping meshes into a world view. and put it into VS.NET of course ... i really dig the single IDE. the Vista UI is going to have alot of 3D elements, so for other developers to make similarly sexy apps to go along with it i think that MS is going to have to work on making the 3D dev experience even easier. MDX is certainly a great step in that direction


worked on this for a month. got the camera tracking working in about a week. the other 3 weeks were me struggling with 3D modeling tools and making my first Direct3D application.


the zip below contains an executable as well as an XCOPY directory. you will need to have .NET 2.0 Beta 2 installed and the August 05 DirectX Release. install DirectX after you install .NET,  because then it will install the Managed DirectX components.


might port this to the Compact Framework to run on Managed Direct3D Mobile (MD3DM)


probably going to try and do something simple with genetic algorithms. later