rss
 
comment(s)

archives
J|F|M|A|M|J|J|A|S|O|N|D
(20##) 10 9 8 7 6 5 4 3 2 1 0 <
 
DesktopWeb FormText   picture CAPTCHAsThu, 21 Jul 2005 19:19:00 GMT # 

since there are some CAPTCHA links springing up (again) ... i went ahead and poked around to see what has been happening. one of the posts i found was by Jason Mauer. so it displays some text and 6 pictures out of which you have to associate one of them with that text. e.g. from the picture below you have to match the text 'hat and cane' to the appropriate image. this is obvious for humans, but is really hard for computers (currently). so that is better than text based CAPTCHAs right now because computers are good at OCR. a negative is the size of the picture database vs the number of different character combinations that can be made with a text based CAPTCHA. of course you can just add more pictures ... but the problem is that really doesn't matter ... because the system can train itself at its leisure.

here is how i would beat it. the program would request the page and get the associated images. it would parse out the challenge text and generate some fingerprint for each image. for different ways of fingerprinting an image, see /aiSomPic. then it would check a database to see if it had already solved that text. if so, it would pick the image with the closest signature. if the text had not been solved before, then it would randomly make a guess with one of the other images. of course it would only pick a guess that it did not already know was the incorrect answer. finally, the program would check the web response to see if it was correct or incorrect, and record that in the database. then it will just keep doing this over and over again ... remember, its a bot. it would start out with alot of failures, but over time it would start getting answers correct, and it would ultimately end up being perfect. adding more images doesn't matter, it will just learn those on its own, with at least 1 out of 6 odds of being correct. of course you can harden this by adding noise, rotation, and such. also refer you to the /aiSomPic article (linked above) to see how this can be handled using a self optimizing map. actually, you've always got a 1 out of 6 chance without doing any processing ... so for a bot ... those are pretty good odds just by doing random guessing.

the text based CAPTCHA did not have this self training capability, because the answer (or some hash) was never represented in either the request or response. so i had to manually train the text based bot what the letters and numbers were. it would take a guess, and then i would correct it if necessary.

anyway, jason is in great shape because he implemented his own CAPTCHA ... making himself a smaller target. its less likely that someone would write custom code to attack his solution vs one of the packaged solutions that are used by thousands of people ... making themselves a large target. he also reduced his attack surface area by only allowing comments for 2 weeks.