[R] Do colClasses in readHTMLTable (XML Package) work?

Duncan Temple Lang duncan at wald.ucdavis.edu
Sat Mar 20 14:04:28 CET 2010




On 3/17/10 6:52 PM, Marshall Feldman wrote:
> Hi,
> 
> I can't get the colClasses option to work in the readHTMLTable function 
> of the XML package. Here's a code fragment:
> 
>     require("XML")
>     doc <- "http://www.nber.org/cycles/cyclesmain.html"
>     table <- getNodeSet(htmlParse(doc),"//table") [[2]]        # The
>     main table is the second one because it's embedded in the page table.
>     xt <- readHTMLTable(
>              table,
>              header =
>     c("peak","trough","contraction","expansion","trough2trough","peak2peak"),
>              colClasses =
>     c("character","character","character","character","character","character"),
>              trim = TRUE
>              )
> 
> Does anyone know what's wrong?

The coercion of the table columns is done before the call to
as.data.frame. You can add

  stringsAsFactors = FALSE

in the call to readHTMLTable() and you'll get what you expect,
I believe.

   D.

> 
>      Marsh Feldman
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list