[R-SIG-Finance] PDF Reader
Brian G. Peterson
brian at braverock.com
Fri Jul 10 19:57:24 CEST 2009
Ben,
I wouldn't really consider this the appropriate forum for your query,
but I'll answer it anyway, with emphasis on the finance-specific bits.
There has existed for many years a utility called "pdf2txt". Note that
this will extract text from a pdf, but may not do a great job with
maintaining the column structure. In the past, I have had to resort to
perl, php, or python to use regular expression matching to put the data
into a tabular format that would be suitable for analysis in R or some
other processing environment.
Also, most fund managers, trustees, administrators, markets, brokerages,
etc do have better data formats available for their investors/clients.
Call them up and tell them that you need the data in machine-readable
form, whether CSV, fixed width, Excel, whatever. Almost all of your
sources should be able to provide this, though it may take some work.
You may not get to choose the format, but any machine-readable format
should be coercible into R or other analysis environments.
Regards,
- Brian
Chiquoine, Ben wrote:
> Hi,
>
>
>
> First let me appoligize if this is the wrong venue for this question...
>
> I work for a small financial company and we often receive statements
> that are in pdf form. Pulling the data from these can be quite time
> consuming and I'm wondering if anyone on the list knows of a way to read
> a pdf in as text in R. I know that google has come out with a few tools
> that allow you to search the text of pdfs which has given me hope that
> something along these lines may be possible but I've been unable to find
> any R documentation on inputing data from PDFs. Any
> thoughts/suggestions would be much appreciated.
>
--
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock
More information about the R-SIG-Finance
mailing list