[R] Please guide -- UTF-8 locale setting fails on Windows on writing

Sunny Singha sunnysingha.analytics at gmail.com
Tue Mar 29 08:46:04 CEST 2016


Milan,
Anwer to your queries:
-- But how do you read back the contents of the file? You need to specify
the encoding when reading it too.
Answer: I read back as stated in 'Case 2'

-- Are you sure the notepad saved the text as UTF-8?
Answer: Yes.

Regards,
Sunny

On Mon, Mar 28, 2016 at 9:58 PM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
> Le lundi 28 mars 2016 à 20:12 +0530, Sunny Singha a écrit :
>> Milan,
>> Ok, Let me take a case of facebook. I used Rfacebook package
>>  to get posts (getPost()) which returns list() of data frames(post,
>> comments, Likes)
>>
>> let me demonstrate 2 cases of read and write just as you suggested,
>> Case 1:::::::::
>> Lets say one of the facebook comment has below string value, in
>> Japanese language-->
>> "世界餐福事工 - 餐廳員工沒精打采 老是打盤子"
>>
>> On R console I now assign above string to variableas: x <- "世界餐福事工 -
>> 餐廳員工沒精打采 老是打盤子"
>> and write it as below:
>> write.csv(x, file='x.csv', row.names=F, fileEncoding='UTF-8')
>> I get this string in the file
>> "" -
>>  "
> But how do you read back the contents of the file? You need to specify
> the encoding when reading it too.
>
>> Case 2::::::::::::::
>> I create a notepad 'x.txt' and save Japanese string "世界餐福事工 - 餐廳員工沒精打采 老是打盤子"
>> and read it as below:
>> read.table('x.txt', fileEncoding='UTF-8'), I get below output:
>>
>>   V1
>> 1  ?
>> Warning messages:
>> 1: In read.table("x.txt", fileEncoding = "UTF-8") :
>>   invalid input found on input connection 'x.txt'
>> 2: In read.table("x.txt", fileEncoding = "UTF-8") :
>>   incomplete final line found by readTableHeader on 'x.txt'
> Are you sure the notepad saved the text as UTF-8?
>
>> Above was for demonstration, I'm infact reading social media data
>> extracted, which ultimately is somewhere using httr package and
>> returning data frames.
>> I'm not sure how should I get it handled in Windows as I don't observe
>> this behavior in Mac where system locase is set to 'en_US.UTF-8'
>>
>> Regards,
>> Sunny
>>
>>
>>
>>
>> On Mon, Mar 28, 2016 at 7:39 PM, Milan Bouchet-Valat  wrote:
>> >
>> > Le lundi 28 mars 2016 à 19:16 +0530, Sunny Singha a écrit :
>> > >
>> > > Hi,
>> > > I think I'm experiencing an issue regarding system Locale. I have
>> > > exported '.csv' formatted data frames gathered from various social
>> > > media platforms like facebook/twitter/G+, etc.
>> > >
>> > > I observe many variable/columns consists of strings formatted similar to below:
>> > > "
>> > > "
>> > >
>> > > As expected and I confirmed, in social media data, they are strings in
>> > > different languages.
>> > > Platform details are provide in the end of this mail. OS locale is set
>> > > to English (United States) hence 'R' locale is 'English_United
>> > > States.1252'
>> > >
>> > > I have attempted to change it to UTF-8 but receives below warning message:
>> > >
>> > > Warning message:
>> > > In Sys.setlocale("LC_ALL", "UTF-8") :
>> > >   OS reports request to set locale to "UTF-8" cannot be honored
>> > You don't need to set the locale. Just pass an appropriate value (e.g.
>> > "UTF-8") to read.csv() or write.csv()'s fileEncoding argument.
>> >
>> > You also didn't tell us what program you used to read these files. Some
>> > might guess the encoding incorrectly, or require you to choose it
>> > manually.
>> >
>> >
>> > Regards
>> >
>> > >
>> > > I have gone through below forums but no resolution so far:
>> > > --- http://stackoverflow.com/questions/20571147/how-to-set-unicode-locale-in-r
>> > > --- https://stat.ethz.ch/pipermail/r-devel/2013-November/067940.html
>> > > --- http://stackoverflow.com/questions/19877676/write-utf-8-files-from-r
>> > > --- https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/
>> > > --- http://withr.me/configure-character-encoding-for-r-under-linux-and-windows/
>> > >
>> > > I'm not sure whether the issue is while reading/extracting the data
>> > > from media or while writing/exporting in Windows directory, but I
>> > > don't experience similar issue in my personal Mac machine. I need some
>> > > clarification here.
>> > >
>> > > How could I export the data just as I see on web ?  Please guide.
>> > >
>> > > Regards,
>> > > Sunny
>> > >
>> > > Platform I'm using::::::::::::::::::::::::::::
>> > > Operating System : Windows 7 Professional SP1
>> > > R version details:
>> > > platform       x86_64-w64-mingw32
>> > > arch           x86_64
>> > > os             mingw32
>> > > system         x86_64, mingw32
>> > > status
>> > > major          3
>> > > minor          2.3
>> > > year           2015
>> > > month          12
>> > > day            10
>> > > svn rev        69752
>> > > language       R
>> > > version.string R version 3.2.3 (2015-12-10)
>> > > nickname       Wooden Christmas-Tree
>> > >
>> > > ______________________________________________
>> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list