[R-SIG-Finance] PDF Reader
BChiquoine at tiff.org
Fri Jul 10 20:14:32 CEST 2009
Thanks Brian and Adrian for your helpful suggestions. pdf2txt looks
like it might do the trick (especially with that great wrapper you put
on in adrian). I've found many hedge fund managers reluctant to give
data out in forms other then pdf because they feel PDFs help them to
prevent redistribution... maybe I should be pushing harder.
From: Brian G. Peterson [mailto:brian at braverock.com]
Sent: Friday, July 10, 2009 1:57 PM
To: Chiquoine, Ben
Cc: r-sig-finance at stat.math.ethz.ch
Subject: Re: [R-SIG-Finance] PDF Reader
I wouldn't really consider this the appropriate forum for your query,
but I'll answer it anyway, with emphasis on the finance-specific bits.
There has existed for many years a utility called "pdf2txt". Note that
this will extract text from a pdf, but may not do a great job with
maintaining the column structure. In the past, I have had to resort to
perl, php, or python to use regular expression matching to put the data
into a tabular format that would be suitable for analysis in R or some
other processing environment.
Also, most fund managers, trustees, administrators, markets, brokerages,
etc do have better data formats available for their investors/clients.
Call them up and tell them that you need the data in machine-readable
form, whether CSV, fixed width, Excel, whatever. Almost all of your
sources should be able to provide this, though it may take some work.
You may not get to choose the format, but any machine-readable format
should be coercible into R or other analysis environments.
Chiquoine, Ben wrote:
> First let me appoligize if this is the wrong venue for this
> I work for a small financial company and we often receive statements
> that are in pdf form. Pulling the data from these can be quite time
> consuming and I'm wondering if anyone on the list knows of a way to
> a pdf in as text in R. I know that google has come out with a few
> that allow you to search the text of pdfs which has given me hope that
> something along these lines may be possible but I've been unable to
> any R documentation on inputing data from PDFs. Any
> thoughts/suggestions would be much appreciated.
Brian G. Peterson
This message and any attached documents contain
information which may be confidential, subject to
privilege or exempt from disclosure under applicable
law. These materials are solely for the use of the
intended recipient. If you are not the intended
recipient of this transmission, you are hereby
notified that any distribution, disclosure, printing,
copying, storage, modification or the taking of any
action in reliance upon this transmission is strictly
prohibited. Delivery of this message to any person
other than the intended recipient shall not
compromise or waive such confidentiality, privilege
or exemption from disclosure as to this
If you have received this communication in error,
please notify the sender immediately and delete
this message from your system.
More information about the R-SIG-Finance