[R] reading tables from multiple HTML pages

s1oliver s1oliver at ucsd.edu
Mon Aug 29 18:04:43 CEST 2011


Hi, beginner to R and was having some problems scraping data from tables in
html using the XML package. I have included some code below.

I am trying to loop through a series of html pages, each of which contains a
single table from which I want to scrape data. However, some of the pages
are blank - and so it throws me an error message when it gets to
htmlParse(). The loop then closes out and I get the error message below:

Error in htmlParse(url) : 
  error in creating parser for
http://www.szrd.gov.cn/viewcommondbfc.do?id=728

How might be best to go about keeping the loop running so I can parse the
rest?

****************************************************

library(XML)

url_root<-"http://www.szrd.gov.cn/viewcommondbfc.do?id="

for(i in 700:750){
	url = paste(url_root, i, sep="")
	doc = htmlParse(url)
	
	tableNodes = getNodeSet(doc, "//table")
	tbl = readHTMLTable(tableNodes[[3]])
}
****************************************************

Steve Oliver
Department of Political Science
University of California at San Diego
9500 Gilman Dr.
La Jolla, CA 92092

--
View this message in context: http://r.789695.n4.nabble.com/reading-tables-from-multiple-HTML-pages-tp3776605p3776605.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list