[R] htmlParse hangs or crashes

Simon Kiss sjkiss at gmail.com
Mon Sep 5 23:48:57 CEST 2011


Dear colleagues,
each time I use htmlParse, R crashes or hangs.  The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing.  The results of sessionInfo() are attached at the bottom of the message.
The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the URLs of the articles linked to on this search page).  Regardless, I'd still like to understand why htmlParse doesn't work.
Thank you for any insight.
Yours, 
Simon Kiss


myurl<-c("http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=&section=&kdaterange=30&date1mm=01&date1dd=01&date1yyyy=2001&date2mm=08&date2dd=25&date2yyyy=2011")

.x<-htmlParse(myurl)

class(.x)
#returns "HTMLInternalDocument" "XMLInternalDocument" 

.x
#returns
*** caught segfault ***
address 0x1398754, cause 'memory not mapped'

Traceback:
 1: .Call("RS_XML_dumpHTMLDoc", doc, as.integer(indent), as.character(encoding),     as.logical(indent), PACKAGE = "XML")
 2: saveXML(from)
 3: saveXML(from)
 4: asMethod(object)
 5: as(x, "character")
 6: cat(as(x, "character"), "\n")
 7: print.XMLInternalDocument(<pointer: 0x11656d3e0>)
 8: print(<pointer: 0x11656d3e0>)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] XML_3.4-0      RCurl_1.5-0    bitops_1.0-4.1
*********************************
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606



More information about the R-help mailing list