[R] reading data from a pdf

Jean Eid jeaneid at chass.utoronto.ca
Mon Oct 24 17:04:07 CEST 2005


Hi,

In my experience pdftotext did not do a very good job at this because it 
screws up the formatting of tables. This of course depends on what 
program the pdf document was originally constructed with. What I found 
most appealing is the use of cut and paste into xemacs or emacs and use 
M-x  canonically-space-region function. This  will eliminate any extra 
spaces. However if the pdf document was prepared through scanning and 
one uses a  character recognition program, then all is up in the air and 
the formatting of tables have to be done by hand.

Jean
rambam at bigpond.net.au wrote:

>>Hi, I'm trying to read data from a PDF file.Is it possible to do it
>>with R? Thanks,  Marco
>>    
>>
>
>If cut and paste to a text file fails, try this:
>
>pdftotext (from the xpdf project)
>
>or
>
>http://pdftohtml.sourceforge.net
>pdftohtml is a utility which converts PDF files into HTML and
>XML formats
>
>In addition, pdftk, the command line pdf toolkit may be useful
>http://www.accesspdf.com/pdftk/
>
>  
>




More information about the R-help mailing list