[R] Reading a web page in pdf format

Gabor Csardi csardi at rmki.kfki.hu
Wed May 9 17:16:05 CEST 2007


Vittorio,

this isn't really an R problem, you need a tool to extract text from a 
PDF document. I've tried pdftotext from the xpdf bundle, and it worked 
fine for the file you linked. In my Ubuntu Linux it is in the
xpdf-utils package, search to xpdf to find out whether it is available 
on windows if you use windows. 

If you want to call it from R you can use the 'system' function. 

There may be other, better method i'm unaware of, of course.

Best,
Gabor

On Wed, May 09, 2007 at 03:47:59PM +0100, Vittorio wrote:
> Each day the daily balance in the following link
> 
> http://www.
> snamretegas.it/italiano/business/gas/bilancio/pdf/bilancio.pdf
> 
> is 
> updated.
> 
> I would like to set up an R procedure to be run daily in a 
> server able to read the figures in a couple of lines only 
> ("Industriale" and "Termoelettrico", towards the end of the balance) 
> and put the data in a table.
> 
> Is that possible? If yes, what R-packages 
> should I use?
> 
> Ciao
> Vittorio
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Csardi Gabor <csardi at rmki.kfki.hu>    MTA RMKI, ELTE TTK



More information about the R-help mailing list