[R-SIG-Mac] problem with character encoding in file names in R 3.2

Denis Chabot chabot.denis at gmail.com
Sun May 3 22:15:13 CEST 2015


OK, good idea Duncan. 

All folders are in the working directory and I added one thing to your suggestion: I created 2 folders from outside R, with the OS, one with and one without accented vowels (été and ete). With the following script, I also create a second folder with accents, as you suggested. 

######
dir.create("bébé") # folder created within working directory
# in addition, 2 folders already exist in this working directory, one with accents, "été", and one without, "ete"

a=1:10

# saving to 3 locations using a file name containing containing accents
save(a, file="bébé/bé.Rda") #1 path with accented vowel created from within R
save(a, file="été/bé.Rda") #2 path with accented vowel created with the OS
save(a, file="ete/bé.Rda") #3 path without accented vowel created with the OS


# saving to 3 locations using a file name containing no accent
save(a, file="bébé/be.Rda") #1 path with accented vowel created from within R
save(a, file="été/be.Rda") #2 path with accented vowel created with the OS
save(a, file="ete/be.Rda") #3 path without accented vowel created with the OS
######

Results: folders "bébé" and "ete" both contain 2 files:
bé.Rda
be.Rda

But the folder "été" contains these 2 files:
bé (Conflit lié au codage Unicode).Rda
be.Rda

Note the extraneous string in the name of the file with the accented vowel. The only situation where the corruption of the file name occurs is when using a file name that has accents and a path created with the OS, not R, that also has accents.

Denis

> Le 2015-05-03 à 15:54, Duncan Murdoch <murdoch.duncan at gmail.com> a écrit :
> 
> On 03/05/2015 3:49 PM, Denis Chabot wrote:
>> Hi Duncan,
>> 
>> Sorry, I did not realize that Mail had changed quotes on me. In R, all double quotes were just normal double quotes, not guillemets.
>> 
>> And to summarize the issue, trying to save a file with a name that includes accented vowels corrupts the file name if the complete path leading to the folder where I want to save the file also contains accented vowels. 
>> 
>> Corruption is probably a strong word, as everything remains readable: the string " (Conflit lié au codage Unicode)" is added after the name I wanted and before the .Rda extension. (I used straight double quotes here, I hope it is what Mail will send! I am not in rich text mode).
>> 
>> And you are correct that it has nothing to do with the paste command, which works properly. If I type the complete path in the save command instead of using the paste command, the unwanted string is still added to the name. This happens on 2 computers, one running Mavericks and another running Yosemite, both with French set as the language for the OS.  
> 
> Could you please simplify?  If paste isn't needed, don't use it, just
> post one command, and copy the result you get.  If you can do it in a
> directory like "~" then I'll be able to to just paste your command into
> my R and see if I get the same problem.  (You might need a dir.create()
> beforehand, so 2 lines.)
> 
> Duncan
> 
>> Denis
>>> Le 2015-05-03 à 15:34, Duncan Murdoch <murdoch.duncan at gmail.com> a écrit :
>>> 
>>> On 03/05/2015 3:11 PM, Denis Chabot wrote:
>>>> Hi,
>>>> 
>>>> I don’t quite know how to produce a repeatable example for you because the problem I have seems to be caused by folder names on my computer.
>>>> 
>>>> Yesterday I was still using R 3.1.2 and there was no problem with this issue. 
>>>> 
>>>> Today with R3.2 it does not.
>>>> 
>>>> Input = "../data/"
>>>> juvcodData = paste0(Input, "Données respirométrie SDA morues juv/« )  # the name of this folder contains accented vowels
>>>> 
>>>> a = 1:10folder1 = paste0(Input, "Données respirométrie SDA morues juv/")
>>>> folder2 = paste0(Input, "Donnees respirometrie SDA morues juv/")
>>>> 
>>>> a = 1:10
>>>> save(a, file="bé.Rda ») #1
>>> 
>>> What you posted here doesn't have regular quotation marks.  I'm not sure
>>> about the opening one, but the closing one has been converted to a
>>> guillemet.  R would never have accepted this as input, so what you're
>>> showing is is probably not what you actually did.
>>> 
>>> You also don't say exactly what happened, just that it didn't work.
>>> 
>>> Could you please post again, simplifying to just one string that fails,
>>> and explain exactly how it fails?  I am sure that having save() call
>>> paste() will not result in any difference from just calling paste() on
>>> its own, and likewise manually typing the full string should produce the
>>> same result as constructing it via paste().  (But I am not sure that
>>> paste() is producing what you want.)
>>> 
>>> Thanks.
>>> 
>>> Duncan Murdoch
>>>> save(a, file=paste(Input, "bé.Rda »)) #2
>>>> save(a, file=paste(folder1, "bé.Rda »)) #3
>>>> save(a, file=paste(folder1, "be.Rda »)) #4
>>>> save(a, file=paste(folder2, "bé.Rda »)) #5
>>>> save(a, file=paste(folder2, "be.Rda »)) #6
>>>> 
>>>> 
>>>> All files were saved with the name I expected, except for # 3: 
>>>> bé (Conflit lié au codage Unicode).Rda
>>>> with the bracket loosely translating to « Conflict related/caused(?) to Unicode coding » 
>>>> 
>>>> So if there is an accented vowel somewhere along the path already defined in R, and the name I want to give the file, the file name is so altered. If the is an accent only in one of the two, no problem. As I said, there was no such problem with R 3.1.2.
>>>> 
>>>> The problem is probably not due to R itself, as this works:
>>>>> paste0(folder1, "bé.Rda")
>>>> [1] "../data/Données respirométrie SDA morues juv/bé.Rda"
>>>> 
>>>> But when R uses this string as a path name when saving a file, I get the problem above.
>>>> 
>>>> Thanks for any help or suggestion,
>>>> 
>>>> Denis
>>>> 
>>>>> sessionInfo()
>>>> R version 3.2.0 (2015-04-16)
>>>> Platform: x86_64-apple-darwin13.4.0 (64-bit)
>>>> Running under: OS X 10.10.3 (Yosemite)
>>>> 
>>>> locale:
>>>> [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8
>>>> 
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base     
>>>> 
>>>> loaded via a namespace (and not attached):
>>>> [1] tools_3.2.0
>>>> 
>>>> _______________________________________________
>>>> R-SIG-Mac mailing list
>>>> R-SIG-Mac at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mac
>>>> 
>>> 
>> 
> 



More information about the R-SIG-Mac mailing list