[R] Please guide -- UTF-8 locale setting fails on Windows on writing
Milan Bouchet-Valat
nalimilan at club.fr
Mon Mar 28 18:28:38 CEST 2016
Le lundi 28 mars 2016 à 20:12 +0530, Sunny Singha a écrit :
> Milan,
> Ok, Let me take a case of facebook. I used Rfacebook package
> to get posts (getPost()) which returns list() of data frames(post,
> comments, Likes)
>
> let me demonstrate 2 cases of read and write just as you suggested,
> Case 1:::::::::
> Lets say one of the facebook comment has below string value, in
> Japanese language-->
> "世界餐福事工 - 餐廳員工沒精打采 老是打盤子"
>
> On R console I now assign above string to variableas: x <- "世界餐福事工 -
> 餐廳員工沒精打采 老是打盤子"
> and write it as below:
> write.csv(x, file='x.csv', row.names=F, fileEncoding='UTF-8')
> I get this string in the file
> "" -
> "
But how do you read back the contents of the file? You need to specify
the encoding when reading it too.
> Case 2::::::::::::::
> I create a notepad 'x.txt' and save Japanese string "世界餐福事工 - 餐廳員工沒精打采 老是打盤子"
> and read it as below:
> read.table('x.txt', fileEncoding='UTF-8'), I get below output:
>
> V1
> 1 ?
> Warning messages:
> 1: In read.table("x.txt", fileEncoding = "UTF-8") :
> invalid input found on input connection 'x.txt'
> 2: In read.table("x.txt", fileEncoding = "UTF-8") :
> incomplete final line found by readTableHeader on 'x.txt'
Are you sure the notepad saved the text as UTF-8?
> Above was for demonstration, I'm infact reading social media data
> extracted, which ultimately is somewhere using httr package and
> returning data frames.
> I'm not sure how should I get it handled in Windows as I don't observe
> this behavior in Mac where system locase is set to 'en_US.UTF-8'
>
> Regards,
> Sunny
>
>
>
>
> On Mon, Mar 28, 2016 at 7:39 PM, Milan Bouchet-Valat wrote:
> >
> > Le lundi 28 mars 2016 à 19:16 +0530, Sunny Singha a écrit :
> > >
> > > Hi,
> > > I think I'm experiencing an issue regarding system Locale. I have
> > > exported '.csv' formatted data frames gathered from various social
> > > media platforms like facebook/twitter/G+, etc.
> > >
> > > I observe many variable/columns consists of strings formatted similar to below:
> > > "
> > > "
> > >
> > > As expected and I confirmed, in social media data, they are strings in
> > > different languages.
> > > Platform details are provide in the end of this mail. OS locale is set
> > > to English (United States) hence 'R' locale is 'English_United
> > > States.1252'
> > >
> > > I have attempted to change it to UTF-8 but receives below warning message:
> > >
> > > Warning message:
> > > In Sys.setlocale("LC_ALL", "UTF-8") :
> > > OS reports request to set locale to "UTF-8" cannot be honored
> > You don't need to set the locale. Just pass an appropriate value (e.g.
> > "UTF-8") to read.csv() or write.csv()'s fileEncoding argument.
> >
> > You also didn't tell us what program you used to read these files. Some
> > might guess the encoding incorrectly, or require you to choose it
> > manually.
> >
> >
> > Regards
> >
> > >
> > > I have gone through below forums but no resolution so far:
> > > --- http://stackoverflow.com/questions/20571147/how-to-set-unicode-locale-in-r
> > > --- https://stat.ethz.ch/pipermail/r-devel/2013-November/067940.html
> > > --- http://stackoverflow.com/questions/19877676/write-utf-8-files-from-r
> > > --- https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/
> > > --- http://withr.me/configure-character-encoding-for-r-under-linux-and-windows/
> > >
> > > I'm not sure whether the issue is while reading/extracting the data
> > > from media or while writing/exporting in Windows directory, but I
> > > don't experience similar issue in my personal Mac machine. I need some
> > > clarification here.
> > >
> > > How could I export the data just as I see on web ? Please guide.
> > >
> > > Regards,
> > > Sunny
> > >
> > > Platform I'm using::::::::::::::::::::::::::::
> > > Operating System : Windows 7 Professional SP1
> > > R version details:
> > > platform x86_64-w64-mingw32
> > > arch x86_64
> > > os mingw32
> > > system x86_64, mingw32
> > > status
> > > major 3
> > > minor 2.3
> > > year 2015
> > > month 12
> > > day 10
> > > svn rev 69752
> > > language R
> > > version.string R version 3.2.3 (2015-12-10)
> > > nickname Wooden Christmas-Tree
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list