[R] XML package example code?

Tony B tony.breyal at googlemail.com
Wed Nov 25 18:03:20 CET 2009


It's been a long time since i read the tutorials, but 'I think', the
reason you get those notifications is because the html code is
malformed, meaning that some of the opening tags '<dd>' don't have
corresponding end tags </dd> etc.

The XML package seems rather good at working with malformed code, and
therefore I usually just force those notifications into an empty
function.


library(RCurl)
library(XML)
html <- getURL("http://www.omegahat.org/RSXML/index.html")
html.tree <- htmlTreeParse(html, useInternalNodes = TRUE, error =
function(...){})

HTH,
Tony Breyal


On 25 Nov, 16:21, Peng Yu <pengyu... at gmail.com> wrote:
> On Wed, Nov 25, 2009 at 12:19 AM, cls59 <ch... at sharpsteen.net> wrote:
>
> > Peng Yu wrote:
>
> >> I'm interested in parsing an html page. I should use XML, right? Could
> >> you somebody show me some example code? Is there a tutorial for this
> >> package?
>
> > Did you try looking through the help pages for the XML package or browsing
> > the Omegahat website?
>
> > Look at:
>
> >  library(XML)
> >  ?htmlTreeParse
>
> > And the relevant web page for documentation and examples is:
>
> >  http://www.omegahat.org/RSXML/
>
> http://www.omegahat.org/RSXML/shortIntro.html
>
> I'm trying the example on the above webpage. But I'm not sure why I
> got the following error. Would you help to take a look?
>
> $ Rscript main.R> library(XML)
>
> > download.file('http://www.omegahat.org/RSXML/index.html','index.html')
>
> trying URL 'http://www.omegahat.org/RSXML/index.html'
> Content type 'text/html; charset=ISO-8859-1' length 3021 bytes
> opened URL
> ==================================================
> downloaded 3021 bytes
>
>
>
> > doc = xmlInternalTreeParse("index.html")
>
> Opening and ending tag mismatch: dd line 68 and dl
> Opening and ending tag mismatch: li line 67 and body
> Opening and ending tag mismatch: dt line 66 and html
> Premature end of data in tag dd line 64
> Premature end of data in tag li line 63
> Premature end of data in tag dt line 62
> Premature end of data in tag dl line 61
> Premature end of data in tag body line 5
> Premature end of data in tag html line 1
> Error: 1: Opening and ending tag mismatch: dd line 68 and dl
> 2: Opening and ending tag mismatch: li line 67 and body
> 3: Opening and ending tag mismatch: dt line 66 and html
> 4: Premature end of data in tag dd line 64
> 5: Premature end of data in tag li line 63
> 6: Premature end of data in tag dt line 62
> 7: Premature end of data in tag dl line 61
> 8: Premature end of data in tag body line 5
> 9: Premature end of data in tag html line 1
> Execution halted
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list