[R-SIG-Finance] PDF Reader

Brian G. Peterson brian at braverock.com
Fri Jul 10 19:57:24 CEST 2009


Ben,

I wouldn't really consider this the appropriate forum for your query, 
but I'll answer it anyway, with emphasis on the finance-specific bits.

There has existed for many years a utility called "pdf2txt".  Note that 
this will extract text from a pdf, but may not do a great job with 
maintaining the column structure.  In the past, I have had to resort to 
perl, php, or python to use  regular expression matching to put the data 
into a tabular format that would be suitable for analysis in R or some 
other processing environment.

Also, most fund managers, trustees, administrators, markets, brokerages, 
etc do have better data formats available for their investors/clients.  
Call them up and tell them that you need the data in machine-readable 
form, whether CSV, fixed width, Excel, whatever.  Almost all of your 
sources should be able to provide this, though it may take some work.  
You may not get to choose the format, but any machine-readable format 
should be coercible into R or other analysis environments.

Regards,

  - Brian

Chiquoine, Ben wrote:
> Hi,
>
>  
>
> First let me appoligize if this is the wrong venue for this question...
>
> I work for a small financial company and we often receive statements
> that are in pdf form.  Pulling the data from these can be quite time
> consuming and I'm wondering if anyone on the list knows of a way to read
> a pdf in as text in R.  I know that google has come out with a few tools
> that allow you to search the text of pdfs which has given me hope that
> something along these lines may be possible but I've been unable to find
> any R documentation on inputing data from PDFs.  Any
> thoughts/suggestions would be much appreciated.
>   
-- 
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock



More information about the R-SIG-Finance mailing list