[R] parse an HTML page with verbose error message (using XML)

Yihui Xie xie at yihui.name
Thu Mar 11 23:35:49 CET 2010


I'm using the function htmlParse() in the XML package, and I need a
little bit help on error handling while parsing an HTML page. So far I
can use either the default way:

# error = xmlErrorCumulator(), by default
library(XML)
doc = htmlParse("http://www.public.iastate.edu/~pdixon/stat500/")
# the error message is:
# htmlParseStartTag: invalid element name

or the tryCatch() approach:

# error = NULL, errors to be caught by tryCatch()
tryCatch({
    doc = htmlParse("http://www.public.iastate.edu/~pdixon/stat500/",
        error = NULL)
}, XMLError = function(e) {
    cat("There was an error in the XML at line", e$line, "column",
        e$col, "\n", e$message, "\n")
})
# verbose error message as:
# There was an error in the XML at line 90 column 2
# htmlParseStartTag: invalid element name

I wish to get the verbose error messages without really stopping the
parsing process; the first approach cannot return detailed error
messages, while the second one will stop the program...

Thanks!

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-6609 Web: http://yihui.name
Department of Statistics, Iowa State University
3211 Snedecor Hall, Ames, IA



More information about the R-help mailing list