[R] scanning a pdf scan

Fri Oct 27 18:52:51 CEST 2006

I don't have specific experience with this but strapply
of package gsubfn can extract information from a string by content
as opposed to delimiters. e.g.

> library(gsubfn)
> strapply("abc34def56xyz", "[0-9]+", c)[[1]]
[1] "34" "56"

On 10/27/06, roger koenker <rkoenker at uiuc.edu> wrote:
> I have a pdf scan of several pages of data from a quite famous old
> paper by
> C.S. Pierce (1873).  I would like (what else?) to convert it into an
> R dataframe.
> Somewhat to my surprise the pdf seems to already be in a character
> recognized
> form, since I can search for numerical strings and they are nicely
> found.  Of
> course, as is usual with such tables there are also headings and
> column lines, etc
> etc. that are less interesting than the numbers themselves.  I've
> tried saving the
> pdf in various formats, some of which look vaguely tractable, but I'm
> hoping
> that there is something that is more automatic.
>
> Does anyone have experience that they could share toward this objective?
>
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    rkoenker at uiuc.edu            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Champaign, IL 61820
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>