rss
 
comment(s)

archives
J|F|M|A|M|J|J|A|S|O|N|D
(20##) 10 9 8 7 6 5 4 3 2 1 0 <
 
DesktopWeb FormText   garbage in-putSun, 26 Sep 2004 20:23:47 GMT # 

another tricky thing about neural networks (and AI in general) is how you get the data into some format that will be useful. typically, the input data needs to be an array of values between -1 and 1. so how do you get that array of values? well ... it varies for each thing that you are trying to do:

for barcodes, it is easy, because everything is already black (1) or white (0). if you have an image that is of lower quality, then that gives you the grey values between 0 and 1. transforming it into a sine wave helped too. for the EAN format, the array length is 95. granted ... i have not been able to successfully read low quality barcode images ... yet ...

optical character recognition is simple too. you just divide up an image into areas of pixels, and then its value is that areas grayscale. next, you just normalize the grayscale between 1 and -1. the array length is then how many areas you divide the image up into e.g. 10x10.

speech recognition was entirely different. the WAV format gives you an array of values which you can immediately plot on a graph. in that format though it was basically meaningless. if you do an FFT transform though, then you can divide it up into segments and produce the array of values by summing (similar to OCR above). then those values just have to be normalized between -1 and 1.

now i'm attempting handwriting recognition (like the Tablet PC). first, i tried OCR with minimal success. now i'm trying to work with the actual stroke data. plotting out the strokes movement as a wave looks promising. also tried FFT, but it looks less promising, and fails altogether on some inputs. there are also some OCR-esque techniques that take the strokes into consideration that i might try later.

anyways, this was not obvious to me before starting AI ... but it takes ALOT of massage work to get the input data into some format so that you can even apply the AI technique. so now i'm constantly having to look into image processing techniques and mathematical transforms. normalizing between 1 and -1 is trivial, but the thing i still dont understand is what to do with variable length inputs. e.g. if i'm trying to recognize a short sound vs a long sound. their input arrays will be of different lengths ... and i dont know how to deal with that ... or if the NN is able to handle that automagically for you?