[R] Parsing of HTML files in R
Douglas Bates
bates at stat.wisc.edu
Thu Oct 25 16:38:55 CEST 2001
Duncan Temple Lang <duncan at research.bell-labs.com> writes:
> If my memory serves me correctly, I believe that Dan Veillard's libxml
> library provides an adaptation of the XML parser that handles HTML. In
> that case, I can add something to the XML package that allows us to
> access the HTML parser and use the same interface for both XML and
> HTML from within R. I'll take a look and see if this is relatively
> easy to do.
Alternatively, try to transform your HTML to XHTML which can be parsed
as XML. See the documentation on the "tidy" utility at
http://www.w3.org/People/Raggett/tidy/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list