[R-SIG-Finance] PDF Reader

Chiquoine, Ben BChiquoine at tiff.org
Fri Jul 10 20:14:32 CEST 2009

Thanks Brian and Adrian for your helpful suggestions.  pdf2txt looks
like it might do the trick (especially with that great wrapper you put
on in adrian).  I've found many hedge fund managers reluctant to give
data out in forms other then pdf because they feel PDFs help them to
prevent redistribution... maybe I should be pushing harder.

Thanks again,


-----Original Message-----
From: Brian G. Peterson [mailto:brian at braverock.com] 
Sent: Friday, July 10, 2009 1:57 PM
To: Chiquoine, Ben
Cc: r-sig-finance at stat.math.ethz.ch
Subject: Re: [R-SIG-Finance] PDF Reader


I wouldn't really consider this the appropriate forum for your query, 
but I'll answer it anyway, with emphasis on the finance-specific bits.

There has existed for many years a utility called "pdf2txt".  Note that 
this will extract text from a pdf, but may not do a great job with 
maintaining the column structure.  In the past, I have had to resort to 
perl, php, or python to use  regular expression matching to put the data

into a tabular format that would be suitable for analysis in R or some 
other processing environment.

Also, most fund managers, trustees, administrators, markets, brokerages,

etc do have better data formats available for their investors/clients.  
Call them up and tell them that you need the data in machine-readable 
form, whether CSV, fixed width, Excel, whatever.  Almost all of your 
sources should be able to provide this, though it may take some work.  
You may not get to choose the format, but any machine-readable format 
should be coercible into R or other analysis environments.


  - Brian

Chiquoine, Ben wrote:
> Hi,
> First let me appoligize if this is the wrong venue for this
> I work for a small financial company and we often receive statements
> that are in pdf form.  Pulling the data from these can be quite time
> consuming and I'm wondering if anyone on the list knows of a way to
> a pdf in as text in R.  I know that google has come out with a few
> that allow you to search the text of pdfs which has given me hope that
> something along these lines may be possible but I've been unable to
> any R documentation on inputing data from PDFs.  Any
> thoughts/suggestions would be much appreciated.
Brian G. Peterson
Ph: 773-459-4973
IM: bgpbraverock

This message and any attached documents contain
information which may be confidential, subject to 
privilege or exempt from disclosure under applicable
law. These materials are solely for the use of the 
intended recipient. If you are not the intended 
recipient of this transmission, you are hereby 
notified that any distribution, disclosure, printing, 
copying, storage, modification or the taking of any
action in reliance upon this transmission is strictly
prohibited. Delivery of this message to any person
other than the intended recipient shall not
compromise or waive such confidentiality, privilege
or exemption from disclosure as to this 

If you have received this communication in error, 
please notify the sender immediately and delete
this message from your system. 

More information about the R-SIG-Finance mailing list