[R] R hangs after htmlTreeParse
Simon Kiss
sjkiss at gmail.com
Thu Aug 25 17:41:24 CEST 2011
Dear colleagues,
I'm trying to parse the html content from this webpage:
http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=§ion=&kdaterange=30&date1mm=01&date1dd=01&date1yyyy=2001&date2mm=08&date2dd=25&date2yyyy=2011
Using the following code
library(RCurl)
library(XML)
myurl<-c("http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=§ion=&kdaterange=30&date1mm=01&date1dd=01&date1yyyy=2001&date2mm=08&date2dd=25&date2yyyy=2011")
.x<-getURL(myurl)
htmlTreeParse(.x, asText=T)
This prints approximately 15 lines of the output from the html document and then mysteriously stops. The command line prompt does not reappear and force quit is the only option.
I'm running R 2.13 on Mac os 10.6 and the latest versions of XML and RCURL are installed.
Yours, Simon Kiss
More information about the R-help
mailing list