[R] Getting data from a PDF-file into R
Peter Dalgaard
P.Dalgaard at biostat.ku.dk
Mon Jan 26 16:40:09 CET 2009
joe1985 wrote:
> Hello
>
> I have around 200 PDF-documents, containing data i want organized in R as a
> dataframe. The PDF-documents look like this;
>
> http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg
>
> or like this;
>
> http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg
>
> So i want to pull out the data in coloured boxes it become organized like
> this (just in R instead of excel);
>
>
> http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg
>
> So the 0'es and 1'es represent when either "PRRS-neg" occurs presented by a
> 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same with
> "PRRS-pos VAC" or "Vac" presented by a 1 in the colum PRRS-VAC, and
> "PRRS-pos DK" or "DK" presented by a 1 in the colum PRRS-DK. And also with
> "sanVAC" there should be a 1 in the colum VACsan, and with "sanDK" there
> should be a 1 in the colum DKsan. The first date for each "CHR-nr" should
> either be the earliest date ne the red box (as in the first picture), or the
> date with word "før" before the date (as in the second picture). All the 200
> PDF-documents looks like the ones in the pictures, each reprenting a
> different "CHR-nr"
>
>
> Hope you can help me
Not on the basis of .jpeg files, I think. We'd need some indication of
what the PDF looks like inside. There's a tool called pdftotext, which
might do something for you, IF you can figure out reliably where your
data begin and end.
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list