[R-sig-Geo] Map digitization and classification

Sat Mar 19 14:18:55 CET 2011

On Sat, Mar 19, 2011 at 12:44 AM, Michael Sumner <mdsumner at gmail.com> wrote:
> I had another look and the georegistration should be pretty accurate
> since there are so many grid lines.

 There's at least two problems here to successful classification:

1. Finding the data grid - tricky because the warping of each image is
different. There's a lot of curvature and I think you'd need a lot of
control points. Alternatively you could look for things that looked
like + signs and infer the best grid from them, but that could be
hard.

 Might even be a patent on it: http://www.freepatentsonline.com/7479969.html

2. Detecting the feature. Also tricky. Some bits of the coastline will
look exactly like vertical bars. Where symbols partly clash with the
coastline they'll look different too.

Zooming right in on the image (1400% or so) shows each pen line to be
about four pixels, and either black or white (was it scanned in mono?)
so despeckling and thresholding might help shape detection. Scanning
in grayscale might be better.

I still think ImageJ might be a handy tool to start working on this. I
believe it has feature detection algorithms.

 Another idea would be to chop it up into the 10x10 grids and create a
job on Amazon's Mechanical Turk system, so real live human beings
would get paid for doing the classification.

 How many pages have you got? You might have to ask yourself if the
effort of coding something to do this would be more than the effort of
typing it all in manually.

 I guess we assume you've tried to find the original authors in order
for them to dig out the punch cards that this data was probably stored
on...

Barry