[R] readHTMLTable (XML package)

Lopez, Dan lopez235 at llnl.gov
Wed Jan 16 00:04:02 CET 2013


David,

Because there is some data on various webpages that I use periodically that this would be convenient for me to use.
Copying and pasting is messy . And obtaining direct database access for the data on some these pages is not possible for me (i.e. won't get approved...but I can use what is out there)

Dan


-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Tuesday, January 15, 2013 3:00 PM
To: Lopez, Dan
Cc: Ista Zahn; R help (r-help at r-project.org)
Subject: Re: [R] readHTMLTable (XML package)


On Jan 15, 2013, at 2:31 PM, Lopez, Dan wrote:

> Hi Ista,
> 
> It does exist. It's a page in our company intranet.
> 
> It is https so it looks like I can't use RCurl either. I tried RCurl BTW and got the below error.
> 
> Do you have experience with pulling a table of an https site? If so how do I do that?

Why not use a browser and save it locally?

--
David.
> 
> 
>> tabs <- 
>> readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_p
>> op.html"))
> Error in readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html")) : 
>  error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in function (type, msg, asError = TRUE)  : 
>  SSL certificate problem, verify that the CA cert is OK. Details:
> error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate 
> verify failed
> 
> 
> Thanks.
> Dan
> 
> -----Original Message-----
> From: Ista Zahn [mailto:istazahn at gmail.com]
> Sent: Tuesday, January 15, 2013 12:22 PM
> To: Lopez, Dan
> Cc: R help (r-help at r-project.org)
> Subject: Re: [R] readHTMLTable (XML package)
> 
> Hi Dan,
> 
> A couple of things: first, I think that file really does not exist (at 
> least I can't open it in my web browser). Second, even if it did,
> url() cannot download from https, according to the details section of 
> ?url, which points you to RCurl. So, once you verify that you url 
> actually exists you can do something like
> 
> library(XML)
> library(RCurl)
> tabs <- 
> readHTMLTable(getURL("http://en.wikipedia.org/wiki/List_of_countries_b
> y_population"))
> 
> Best,
> Ista
> 
> On Tue, Jan 15, 2013 at 2:59 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
>> Hi,
>> 
>> I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that.
>> 
>>> library(XML)
>>> wampage<-readHTMLTable('https://hr-workforce-analytics.llnl.gov/wf_p
>>> i
>>> _pop.html',1)
>> Error in htmlParse(doc) :
>>  File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does not 
>> exist
>> 
>> Dan
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list