[R] XML and RCurl: problem with encoding (htmlTreeParse)
Lauri Nikkinen
lauri.nikkinen at iki.fi
Fri Jan 1 15:14:19 CET 2010
Thanks. Interestingly, your code works on my Mac 10.6.1 but not on my
Win XP. See sessionInfo from below.
Mac R:
> sessionInfo()
R version 2.9.2 (2009-08-24)
i386-apple-darwin8.11.1
locale:
fi_FI.UTF-8/fi_FI.UTF-8/C/C/fi_FI.UTF-8/fi_FI.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_2.6-0
>
WinXP:
> sessionInfo()
R version 2.9.2 (2009-08-24)
i386-pc-mingw32
locale:
LC_COLLATE=Finnish_Finland.1252;LC_CTYPE=Finnish_Finland.1252;LC_MONETARY=Finnish_Finland.1252;LC_NUMERIC=C;LC_TIME=Finnish_Finland.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_2.6-0 RCurl_1.2-1 bitops_1.0-4.1
loaded via a namespace (and not attached):
[1] tools_2.9.2
>
-L
2009/12/31 Eduardo Leoni <leoniedu at msu.edu>:
> In the meantime, try this.
> library(XML)
> theurl <- "http://www.aarresaari.net/jobboard/jobs.html"
> download.file(theurl, "tmp.html")
> txt <- readLines("tmp.html")
> txt <- htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE)
> g <- xpathSApply(txt, "//p", function(x) xmlValue(x))
> head(grep(" ", g, value=T))
> It works for me:
>
>
More information about the R-help
mailing list