[R] XML and RCurl: problem with encoding (htmlTreeParse)

Lauri Nikkinen lauri.nikkinen at iki.fi
Fri Jan 1 15:14:19 CET 2010


Thanks. Interestingly, your code works on my Mac 10.6.1 but not on my
Win XP. See sessionInfo from below.

Mac R:
> sessionInfo()
R version 2.9.2 (2009-08-24)
i386-apple-darwin8.11.1

locale:
fi_FI.UTF-8/fi_FI.UTF-8/C/C/fi_FI.UTF-8/fi_FI.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] XML_2.6-0
>

WinXP:
> sessionInfo()
R version 2.9.2 (2009-08-24)
i386-pc-mingw32

locale:
LC_COLLATE=Finnish_Finland.1252;LC_CTYPE=Finnish_Finland.1252;LC_MONETARY=Finnish_Finland.1252;LC_NUMERIC=C;LC_TIME=Finnish_Finland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] XML_2.6-0      RCurl_1.2-1    bitops_1.0-4.1

loaded via a namespace (and not attached):
[1] tools_2.9.2
>

-L

2009/12/31 Eduardo Leoni <leoniedu at msu.edu>:
> In the meantime, try this.
> library(XML)
> theurl <- "http://www.aarresaari.net/jobboard/jobs.html"
> download.file(theurl, "tmp.html")
> txt <- readLines("tmp.html")
> txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE)
> g <- xpathSApply(txt, "//p", function(x) xmlValue(x))
> head(grep(" ", g, value=T))
> It works for me:
>
>



More information about the R-help mailing list