[R] trouble for parsing HTML files

Milan Bouchet-Valat nalimilan at club.fr
Fri Mar 23 18:51:49 CET 2012


Le vendredi 23 mars 2012 à 08:10 +0100, Julien Velcin a écrit :
> Here it is:
> 
> R version 2.14.2 (2012-02-29)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
I guess the OS uses a French locale? Maybe the discrepancy between R
locale and the OS's is the problem. Can you try with a French locale?
This would be strange, because UTF-8 should be the same in both
settings, but still worth a try...

Else, please do this and post the output, just in case:
url <- "http://www.huffingtonpost.com/social/GraniteSkyline?action=fans"
lines <- readLines(url)
head(lines)
library(tools)
showNonASCII(head(lines))


Hope this helps



More information about the R-help mailing list