[R] trouble with character \u00e2

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Oct 8 19:09:38 CEST 2008


Can you please try a 2.8.0 beta build?  I have a suspicion as to what 
might be going on, and it cannot happen there.

If my guess is correct,

nfile <- paste("diagnostic â vs a ", file.label, ".jpg", sep = "")
savePlot(path.expand(nfile), type="jpg")

may work for you in 2.7.2 (but as I said, I wasn't able to reproduce this 
there).  The crucial bit is to use path.expand() on the final file name: 
it will do nothing except ensure that the encoding is correct.

On Wed, 8 Oct 2008, Charles Annis, P.E. wrote:

> Thank you Professor:
>
> After reading in the file this is what I see:
>> file.label
> [1] "EXAMPLE 1 â vs a.xls"
>
> charToRaw(file.label)
> [1] 45 58 41 4d 50 4c 45 20 31 20 c3 a2 20 76 73 20 61 2e 78 6c 73
>
>> Encoding(file.label)
> [1] "UTF-8"
>
>> Encoding(paste("diagnostic â vs a ", file.label, ".jpg", sep = ""))
> [1] "UTF-8"
>
> But look what happens after I run your example:
>> charToRaw(file.label)
> [1] 45 58 41 4d 50 4c 45 20 31 20 e2 20 76 73 20 61 2e 78 6c 73     (after)
> [1] 45 58 41 4d 50 4c 45 20 31 20 c3 a2 20 76 73 20 61 2e 78 6c 73 (before)
>
> The file label appears on the screen as it does above both times, but
> clearly charToRaw() shows that the coding for â has changed from the
> unexpected c3 a2, to the desired e2.
>
> After running your example I now observe
>> Encoding(file.label)
> [1] "latin1"
>
> Again, thank you for your help.
>
> Charles Annis, P.E.
>
> Charles.Annis at StatisticalEngineering.com
> phone: 561-352-9699
> eFax:  614-455-3265
> http://www.StatisticalEngineering.com
>
>
> -----Original Message-----
> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
> Sent: Wednesday, October 08, 2008 10:32 AM
> To: Charles Annis, P.E.
> Cc: r-help at r-project.org
> Subject: RE: [R] trouble with character \u00e2
>
> That also works without a hitch on my box, even in vanilla 2.7.2.  What
> exactly is in file.label as given by
>
> charToRaw(file.label)
> Encoding(file.label)
>
> ?  It should be in UTF-8, and so should
>
> paste("diagnostic â vs a ", file.label, ".jpg", sep = "")
>
> It looks like the latter is not being treated as UTF-8 on your system (see
> what Encoding() says on its value).
>
> On Wed, 8 Oct 2008, Charles Annis, P.E. wrote:
>
>> Thank you, Professor Ripley:
>>
>> Your example works for me too.
>>
>> plot(1:10, xlab = "a", ylab = "â")
>> file.label <- "EXAMPLE 1 â vs a.xls"
>> savePlot(paste("diagnostic â vs a ", file.label, ".jpg",
>>          sep = ""), type = "jpg")
>>
>>
>> But, if I read-in the file name using file.choose() I get the same
> corrupted
>> output filename ( "diagnostic â vs a EXAMPLE 1 â vs a.xls.jpg" ) from my
> R
>> routines.  However, if I paste that same file.label as it is printed to
> the
>> screen with my input routine, replacing your "foo" (as above) things work
> as
>> they should ( "diagnostic â vs a EXAMPLE 1 â vs a.xls.jpg" ).
> Furthermore,
>> if I again run my plotting routines after your example (like that here,
>> above), my routines no longer produce corrupted filenames for the saved
>> plots.
>>
>> The trouble seems to be caused by my how I read-in the file name.  Here is
> a
>> simple example that produces a corrupted file name for the saved plot:
>>
>> plot(1:10, xlab = "a", ylab = "â")
>> file.name <<- file.choose()
>>    print(file.name)
>>    file.label <<- basename(file.name)
>> savePlot(paste("diagnostic â vs a ", file.label, ".jpg",
>>          sep = ""), type = "jpg")
>>
>>
>> The name of my input Excel file is "EXAMPLE 1 â vs a.xls"
>> The problem does not occur on R < R2.7.0
>>
>> I am running R2.7.2 on a 5 year old DELL box (2 Gig RAM, 3GHz Pentium 4)
>> with Windows XP, and have also experienced the problem on my Thinkpad
> laptop
>> (2 Gig, Intel Core2 Duo, 1.6GHz) running Vista.
>>
>> Thank you for your counsel.
>>
>> Charles Annis, P.E.
>>
>> Charles.Annis at StatisticalEngineering.com
>> phone: 561-352-9699
>> eFax:  614-455-3265
>> http://www.StatisticalEngineering.com
>>
>>
>> -----Original Message-----
>> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
>> Sent: Wednesday, October 08, 2008 4:39 AM
>> To: Charles Annis, P.E.
>> Cc: r-help at r-project.org
>> Subject: Re: [R] trouble with character \u00e2
>>
>> You haven't given any of the information asked for in the posting guide.
>> But, assuming this is Windows in CP1252 (as I believe that has been your
>> locale before), it works for me in current R.
>>
>> plot(1:10)
>> file.label <- "foo"
>> savePlot(paste("diagnostic â vs a ", file.label, ".jpg",
>>          sep = ""), type = "jpg")
>>
>> If you are not using 2.8.0 beta or 2.7.2 patched, please check those.
>> This might be related to
>>
>>     o	file.path() did not work correctly in 2.7.0 if the components
>> 	had different encodings.
>>
>> (NEWS for 2.7.1).
>>
>> On Sun, 5 Oct 2008, Charles Annis, P.E. wrote:
>>
>>> Greetings R-wizards:
>>>
>>> For historical reasons I have filenames with the character "â" and have
>>> successfully used "\u00e2" in its place, with the hoped-for result on all
>> my
>>> on-screen plots.
>>>
>>> However since R2.7.0 I have trouble with savePlot() when the file name
>>> includes that character as it does in this example:
>>>
>>> savePlot(paste("diagnostic â vs a ", file.label, ".jpg",
>>>        sep = ""), type = "jpg")
>>>
>>> In R2.6.0 and earlier, R would ignore a dot ('.') in the file name and
>>> supply the extension.  Since R2.7.0 if filename does include a dot,
>>> savePlot() will  not add the file type as an extension.  Thus my apparent
>>> redundancy in the file name.
>>>
>>> The problem I have is that the example command will substitute an
> unwanted
>>> character for â, yet if I use "File, save as, jpg ... " and type in a
> name
>>> containing the troublesome character, R saves the on-screen plot with
> that
>>> character in the name with no complaints.
>>>
>>> I have tried using iconv() with no success, as can be seen with the
>>> following code:
>>>
>>> file.name <- paste("diagnostic â vs a ", file.label, ".jpg", sep = "")
>>>
>>> iconv.List <- iconvlist()
>>>
>>> for(encoding in iconv.List) {
>>>
>>> print(iconv(file.name, "", encoding, ""))}
>>>
>>> So, here's the question:  How can I save, with a non-interactive R
>> command,
>>> an existing plot with the troublesome character in the file name?
>>>
>>> Thanks.
>>>
>>>
>>>
>>> Charles Annis, P.E.
>>>
>>> Charles.Annis at StatisticalEngineering.com
>>> phone: 561-352-9699
>>> eFax:  614-455-3265
>>> http://www.StatisticalEngineering.com
>>>  
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> --
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>
>>
>
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list