[R] Reading a web page in pdf format
jim holtman
jholtman at gmail.com
Wed May 9 17:42:45 CEST 2007
You can do it with the base toolkit. Just read the PDF file in as
text and then extract the data:
> # read in PDF file as text
> x.in <- readLines("http://www.snamretegas.it/italiano/business/gas/bilancio/pdf/bilancio.pdf")
> # find Industriale
> Ind <- grep("Industriale", x.in, value=TRUE)
> # find Termoelettrico
> Ter <- grep("Termoelettrico", x.in, value=TRUE)
> # extract the data
> Ind.data <- sub(".*\\(([\\s0-9,]*)\\).*", "\\1", Ind, perl=TRUE)
> Ter.data <- sub(".*\\(([\\s0-9,]*)\\).*", "\\1", Ter, perl=TRUE)
> Ind.data
[1] " 46,6"
> Ter.data
[1] " 99,3"
>
>
>
On 5/9/07, Vittorio <vdemart1 at tin.it> wrote:
> Each day the daily balance in the following link
>
> http://www.
> snamretegas.it/italiano/business/gas/bilancio/pdf/bilancio.pdf
>
> is
> updated.
>
> I would like to set up an R procedure to be run daily in a
> server able to read the figures in a couple of lines only
> ("Industriale" and "Termoelettrico", towards the end of the balance)
> and put the data in a table.
>
> Is that possible? If yes, what R-packages
> should I use?
>
> Ciao
> Vittorio
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
More information about the R-help
mailing list