[R-SIG-Finance] PDF Reader

Adrian Dragulescu adrian_d at eskimo.com
Fri Jul 10 19:51:22 CEST 2009


You can use pdftotext to convert to text files and then parse your text 
files. http://en.wikipedia.org/wiki/Pdftotext

It works well in simple cases.

I have a wrapper something like this:

read.pdf <- function(filein, fileout=NULL, layout=TRUE, first=NULL,
                      last=NULL, eol=NULL, opw=NULL, upw=NULL)
{
   cmd <- "the/path/to/pdftotext "

   if (layout) cmd <- paste(cmd, "-layout")
   if (!is.null(first)) cmd <- paste(cmd, "-f", first)
   if (!is.null(last))  cmd <- paste(cmd, "-l", last)
   if (!is.null(eol))   cmd <- paste(cmd, "-l", eol)
   if (!is.null(opw))   cmd <- paste(cmd, "-l", opw)
   if (!is.null(upw))   cmd <- paste(cmd, "-l", upw)

   if (!file.exists(filein)){
     stop(paste("Cannot find file: ", filein))
   } else {
     cmd <- paste(cmd, shQuote(filein))
   }

   if (is.null(fileout)){
     cmd <- paste(cmd, "-")
     res <- system(cmd, intern=TRUE)
   } else {
     cmd <- paste(cmd, fileout)
     res <- system(cmd)
     if (res != 0)
       stop("Pdf conversion to txt failed.\n")
   }

   return(res)
}

Adrian



On Fri, 10 Jul 2009, Chiquoine, Ben wrote:

> Hi,
>
>
>
> First let me appoligize if this is the wrong venue for this question...
>
> I work for a small financial company and we often receive statements
> that are in pdf form.  Pulling the data from these can be quite time
> consuming and I'm wondering if anyone on the list knows of a way to read
> a pdf in as text in R.  I know that google has come out with a few tools
> that allow you to search the text of pdfs which has given me hope that
> something along these lines may be possible but I've been unable to find
> any R documentation on inputing data from PDFs.  Any
> thoughts/suggestions would be much appreciated.
>
>
>
> Thanks,
>
>
>
> Ben
>
>
> ___________________________________________
> This message and any attached documents contain
> information which may be confidential, subject to
> privilege or exempt from disclosure under applicable
> law. These materials are solely for the use of the
> intended recipient. If you are not the intended
> recipient of this transmission, you are hereby
> notified that any distribution, disclosure, printing,
> copying, storage, modification or the taking of any
> action in reliance upon this transmission is strictly
> prohibited. Delivery of this message to any person
> other than the intended recipient shall not
> compromise or waive such confidentiality, privilege
> or exemption from disclosure as to this
> communication.
>
> If you have received this communication in error,
> please notify the sender immediately and delete
> this message from your system.
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.
>



More information about the R-SIG-Finance mailing list