[R] R hangs after htmlTreeParse

Simon Kiss sjkiss at gmail.com
Thu Aug 25 17:41:24 CEST 2011


Dear colleagues,
I'm trying to parse the html content from this webpage:
http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=&section=&kdaterange=30&date1mm=01&date1dd=01&date1yyyy=2001&date2mm=08&date2dd=25&date2yyyy=2011

Using the following code
library(RCurl)
library(XML)
myurl<-c("http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=&section=&kdaterange=30&date1mm=01&date1dd=01&date1yyyy=2001&date2mm=08&date2dd=25&date2yyyy=2011")

.x<-getURL(myurl)
htmlTreeParse(.x, asText=T)

This prints approximately 15 lines of the output from the html document and then mysteriously stops. The command line prompt does not reappear and force quit is the only option. 
I'm running R 2.13 on Mac os 10.6 and the latest versions of XML and RCURL are installed.
Yours, Simon Kiss



More information about the R-help mailing list