[R-SIG-Mac] problem with character encoding in file names in R 3.2

Duncan Murdoch murdoch.duncan at gmail.com
Mon May 4 00:14:30 CEST 2015


On 03/05/2015 4:15 PM, Denis Chabot wrote:
> OK, good idea Duncan. 
> 
> All folders are in the working directory and I added one thing to your suggestion: I created 2 folders from outside R, with the OS, one with and one without accented vowels (été and ete). With the following script, I also create a second folder with accents, as you suggested. 

Thanks, I'll see if I can reproduce this.

Duncan Murdoch

> ######
> dir.create("bébé") # folder created within working directory
> # in addition, 2 folders already exist in this working directory, one with accents, "été", and one without, "ete"
> 
> a=1:10
> 
> # saving to 3 locations using a file name containing containing accents
> save(a, file="bébé/bé.Rda") #1 path with accented vowel created from within R
> save(a, file="été/bé.Rda") #2 path with accented vowel created with the OS
> save(a, file="ete/bé.Rda") #3 path without accented vowel created with the OS
> 
> 
> # saving to 3 locations using a file name containing no accent
> save(a, file="bébé/be.Rda") #1 path with accented vowel created from within R
> save(a, file="été/be.Rda") #2 path with accented vowel created with the OS
> save(a, file="ete/be.Rda") #3 path without accented vowel created with the OS
> ######
> 
> Results: folders "bébé" and "ete" both contain 2 files:
> bé.Rda
> be.Rda
> 
> But the folder "été" contains these 2 files:
> bé (Conflit lié au codage Unicode).Rda
> be.Rda
> 
> Note the extraneous string in the name of the file with the accented vowel. The only situation where the corruption of the file name occurs is when using a file name that has accents and a path created with the OS, not R, that also has accents.
> 
> Denis
> 
>> Le 2015-05-03 à 15:54, Duncan Murdoch <murdoch.duncan at gmail.com> a écrit :
>>
>> On 03/05/2015 3:49 PM, Denis Chabot wrote:
>>> Hi Duncan,
>>>
>>> Sorry, I did not realize that Mail had changed quotes on me. In R, all double quotes were just normal double quotes, not guillemets.
>>>
>>> And to summarize the issue, trying to save a file with a name that includes accented vowels corrupts the file name if the complete path leading to the folder where I want to save the file also contains accented vowels. 
>>>
>>> Corruption is probably a strong word, as everything remains readable: the string " (Conflit lié au codage Unicode)" is added after the name I wanted and before the .Rda extension. (I used straight double quotes here, I hope it is what Mail will send! I am not in rich text mode).
>>>
>>> And you are correct that it has nothing to do with the paste command, which works properly. If I type the complete path in the save command instead of using the paste command, the unwanted string is still added to the name. This happens on 2 computers, one running Mavericks and another running Yosemite, both with French set as the language for the OS.  
>>
>> Could you please simplify?  If paste isn't needed, don't use it, just
>> post one command, and copy the result you get.  If you can do it in a
>> directory like "~" then I'll be able to to just paste your command into
>> my R and see if I get the same problem.  (You might need a dir.create()
>> beforehand, so 2 lines.)
>>
>> Duncan
>>
>>> Denis
>>>> Le 2015-05-03 à 15:34, Duncan Murdoch <murdoch.duncan at gmail.com> a écrit :
>>>>
>>>> On 03/05/2015 3:11 PM, Denis Chabot wrote:
>>>>> Hi,
>>>>>
>>>>> I don’t quite know how to produce a repeatable example for you because the problem I have seems to be caused by folder names on my computer.
>>>>>
>>>>> Yesterday I was still using R 3.1.2 and there was no problem with this issue. 
>>>>>
>>>>> Today with R3.2 it does not.
>>>>>
>>>>> Input = "../data/"
>>>>> juvcodData = paste0(Input, "Données respirométrie SDA morues juv/« )  # the name of this folder contains accented vowels
>>>>>
>>>>> a = 1:10folder1 = paste0(Input, "Données respirométrie SDA morues juv/")
>>>>> folder2 = paste0(Input, "Donnees respirometrie SDA morues juv/")
>>>>>
>>>>> a = 1:10
>>>>> save(a, file="bé.Rda ») #1
>>>>
>>>> What you posted here doesn't have regular quotation marks.  I'm not sure
>>>> about the opening one, but the closing one has been converted to a
>>>> guillemet.  R would never have accepted this as input, so what you're
>>>> showing is is probably not what you actually did.
>>>>
>>>> You also don't say exactly what happened, just that it didn't work.
>>>>
>>>> Could you please post again, simplifying to just one string that fails,
>>>> and explain exactly how it fails?  I am sure that having save() call
>>>> paste() will not result in any difference from just calling paste() on
>>>> its own, and likewise manually typing the full string should produce the
>>>> same result as constructing it via paste().  (But I am not sure that
>>>> paste() is producing what you want.)
>>>>
>>>> Thanks.
>>>>
>>>> Duncan Murdoch
>>>>> save(a, file=paste(Input, "bé.Rda »)) #2
>>>>> save(a, file=paste(folder1, "bé.Rda »)) #3
>>>>> save(a, file=paste(folder1, "be.Rda »)) #4
>>>>> save(a, file=paste(folder2, "bé.Rda »)) #5
>>>>> save(a, file=paste(folder2, "be.Rda »)) #6
>>>>>
>>>>>
>>>>> All files were saved with the name I expected, except for # 3: 
>>>>> bé (Conflit lié au codage Unicode).Rda
>>>>> with the bracket loosely translating to « Conflict related/caused(?) to Unicode coding » 
>>>>>
>>>>> So if there is an accented vowel somewhere along the path already defined in R, and the name I want to give the file, the file name is so altered. If the is an accent only in one of the two, no problem. As I said, there was no such problem with R 3.1.2.
>>>>>
>>>>> The problem is probably not due to R itself, as this works:
>>>>>> paste0(folder1, "bé.Rda")
>>>>> [1] "../data/Données respirométrie SDA morues juv/bé.Rda"
>>>>>
>>>>> But when R uses this string as a path name when saving a file, I get the problem above.
>>>>>
>>>>> Thanks for any help or suggestion,
>>>>>
>>>>> Denis
>>>>>
>>>>>> sessionInfo()
>>>>> R version 3.2.0 (2015-04-16)
>>>>> Platform: x86_64-apple-darwin13.4.0 (64-bit)
>>>>> Running under: OS X 10.10.3 (Yosemite)
>>>>>
>>>>> locale:
>>>>> [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base     
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] tools_3.2.0
>>>>>
>>>>> _______________________________________________
>>>>> R-SIG-Mac mailing list
>>>>> R-SIG-Mac at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>>>>
>>>>
>>>
>>
>



More information about the R-SIG-Mac mailing list