[R] scanning a pdf scan
roger koenker
rkoenker at uiuc.edu
Fri Oct 27 21:42:33 CEST 2006
Thanks for your suggestions. Trial and error experimentation
with adobe acrobat produced the following method:
It looks like it is possible to highlight the numerical part of the
table in Acrobat and then copy/paste into a text file, with about
98 percent accuracy. Wonders never cease.
url: www.econ.uiuc.edu/~roger Roger Koenker
email rkoenker at uiuc.edu Department of Economics
vox: 217-333-4558 University of Illinois
fax: 217-244-6678 Champaign, IL 61820
On Oct 27, 2006, at 11:52 AM, Gabor Grothendieck wrote:
> I don't have specific experience with this but strapply
> of package gsubfn can extract information from a string by content
> as opposed to delimiters. e.g.
>
>> library(gsubfn)
>> strapply("abc34def56xyz", "[0-9]+", c)[[1]]
> [1] "34" "56"
>
> On 10/27/06, roger koenker <rkoenker at uiuc.edu> wrote:
>> I have a pdf scan of several pages of data from a quite famous old
>> paper by
>> C.S. Pierce (1873). I would like (what else?) to convert it into an
>> R dataframe.
>> Somewhat to my surprise the pdf seems to already be in a character
>> recognized
>> form, since I can search for numerical strings and they are nicely
>> found. Of
>> course, as is usual with such tables there are also headings and
>> column lines, etc
>> etc. that are less interesting than the numbers themselves. I've
>> tried saving the
>> pdf in various formats, some of which look vaguely tractable, but I'm
>> hoping
>> that there is something that is more automatic.
>>
>> Does anyone have experience that they could share toward this
>> objective?
>>
>>
>> url: www.econ.uiuc.edu/~roger Roger Koenker
>> email rkoenker at uiuc.edu Department of Economics
>> vox: 217-333-4558 University of Illinois
>> fax: 217-244-6678 Champaign, IL 61820
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
More information about the R-help
mailing list