[R] readHTLMTable help

Wed Mar 28 07:51:47 CEST 2012

Hi Lucas

 The HTML page is formatted by using tables in each of the cells
of the top-most table.   As a result, the simple table is much more
complex. readHTMLTable() is intended for quick and easy tables.
For tables such as this, you have to implement more customized processors.

doc = htmlParse("http://164.77.222.61/climatologia/php/vientoMaximo8.php?IdEstacion=330007&FechaIni=01-1-1980")

tb = getNodeSet(doc, "//table")[[1]]

This gives the top-most table.

xmlSize(tb) tells us the number of rows. We want to skip the first 3 to get to the data.
Then in each of these you can process each row and the cells that have the data.
And the details go on....

  D.

On 3/27/12 10:57 AM, Lucas wrote:
> Hello to everyone.
> I´m using this function to download some information from a website.
> This is the URL:
> http://164.77.222.61/climatologia/php/vientoMaximo8.php?IdEstacion=330007&FechaIni=01-1-1980
> If you go to that website you´ll find a table with meteorological
> information. One column is called "Intesidad Máxima Diaria", and that is
> the one i need.
> I´ve been traying to extract that column, but I´m unable to do it.
> First I tryed simple to download the complete table and then do some kind
> of filter to extract the column but, for some reason when I call the
> function
> a<-readHTLMTable(url), the table is downloaded in a unfriendly format and I
> can not differentiate the column
> 
> If anyone could help me I´ll appreciate it.
> Thank you.
> 
> Lucas.
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.