[R] From two colors to 01 sequences

Zeljko Vrba zvrba at ifi.uio.no
Tue May 12 13:38:31 CEST 2009


On Tue, May 12, 2009 at 12:20:56PM +0100, Paul Smith wrote:
> 
> I have got several pdf files with rows of colored rectangles: red
> rectangles should be read as 0; green rectangles as 1. No other color
> exists. Is there some way to have R reading the colored rectangles to
> a matrix or data frame converting the color of the rectangles to
> sequences of 01?
> 
I would not do it with R, but.. here's the general approach:

  1. Convert the PDF to some raster format at high enough resolution (DPI)
     without any kind of compression or anti-aliasing
  2. Use some image manipulation program to replace red/green with black/white
  3. Save the resulting picture in ASCII PBM format
  4. Parse the resulting PBM and find 0->1 and 1->0 transitions which will give
     you rectangle boundaries.
  5. You did not specify the kind of rectangles, nor whether rows are of
     uniform height, so I assume uniform grid.  Otherwise, position and size
	 might also be relevant[1], the interpretation is completely up to you.

[1] As in, for example, http://educ.queensu.ca/~fmc/october2001/GoldenArt3.gif
(Imagine that there are only two colors instead of 4 + black lines)




More information about the R-help mailing list