[R-SIG-Finance] PDF Reader
Adrian Dragulescu
adrian_d at eskimo.com
Fri Jul 10 19:51:22 CEST 2009
You can use pdftotext to convert to text files and then parse your text
files. http://en.wikipedia.org/wiki/Pdftotext
It works well in simple cases.
I have a wrapper something like this:
read.pdf <- function(filein, fileout=NULL, layout=TRUE, first=NULL,
last=NULL, eol=NULL, opw=NULL, upw=NULL)
{
cmd <- "the/path/to/pdftotext "
if (layout) cmd <- paste(cmd, "-layout")
if (!is.null(first)) cmd <- paste(cmd, "-f", first)
if (!is.null(last)) cmd <- paste(cmd, "-l", last)
if (!is.null(eol)) cmd <- paste(cmd, "-l", eol)
if (!is.null(opw)) cmd <- paste(cmd, "-l", opw)
if (!is.null(upw)) cmd <- paste(cmd, "-l", upw)
if (!file.exists(filein)){
stop(paste("Cannot find file: ", filein))
} else {
cmd <- paste(cmd, shQuote(filein))
}
if (is.null(fileout)){
cmd <- paste(cmd, "-")
res <- system(cmd, intern=TRUE)
} else {
cmd <- paste(cmd, fileout)
res <- system(cmd)
if (res != 0)
stop("Pdf conversion to txt failed.\n")
}
return(res)
}
Adrian
On Fri, 10 Jul 2009, Chiquoine, Ben wrote:
> Hi,
>
>
>
> First let me appoligize if this is the wrong venue for this question...
>
> I work for a small financial company and we often receive statements
> that are in pdf form. Pulling the data from these can be quite time
> consuming and I'm wondering if anyone on the list knows of a way to read
> a pdf in as text in R. I know that google has come out with a few tools
> that allow you to search the text of pdfs which has given me hope that
> something along these lines may be possible but I've been unable to find
> any R documentation on inputing data from PDFs. Any
> thoughts/suggestions would be much appreciated.
>
>
>
> Thanks,
>
>
>
> Ben
>
>
> ___________________________________________
> This message and any attached documents contain
> information which may be confidential, subject to
> privilege or exempt from disclosure under applicable
> law. These materials are solely for the use of the
> intended recipient. If you are not the intended
> recipient of this transmission, you are hereby
> notified that any distribution, disclosure, printing,
> copying, storage, modification or the taking of any
> action in reliance upon this transmission is strictly
> prohibited. Delivery of this message to any person
> other than the intended recipient shall not
> compromise or waive such confidentiality, privilege
> or exemption from disclosure as to this
> communication.
>
> If you have received this communication in error,
> please notify the sender immediately and delete
> this message from your system.
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>
More information about the R-SIG-Finance
mailing list