[R] UTF-8 or Unicode on Windows PC

Hans-Joerg Bibiko bibiko at eva.mpg.de
Tue Apr 22 16:49:05 CEST 2008


On 21 Apr 2008, at 12:33, Prof Brian Ripley wrote:

>> Is it possible to download a compiled snapshot of 2.7.0 for Windows  
>> XP?
> Yes, http://cran.r-project.org/bin/windows/base/rtest.html
> And it is due for release tomorrow.

I played with 2.7.0 on Windows XP. I can do things which couldn't be  
done with 2.6.x. Many many thanks for the effort!!!

But, I always came to a point where I didn't find a solution, due to  
the fact that Windows has no UTF-8 locale(s).
Has Windows Vista UTF-8 locales?
If I'm dealing with known languages I'm able to get rid of a lot of  
things.

But my/our problem is that we have to deal with different languages at  
the same time [in a data.frame]. Furthermore I/we have to deal with  
IPA symbols, which haven't a locale; and grep, strsplit, etc. are set  
up on top of the chosen locale. Thus I'm not able to use strsplit on a  
string which contains German, Russian, IPA-symbols, because all glyphs  
which are not part of the chosen locale are displayed [e.g. as output  
of strsplit()] literally as <U+XXXX>.

That's why the only solution is to use an UTF-8 environment (OS) or  
for hard-liners to transform each glyph into numbers and to do  
research on that numbers (which is really annoying ;).

Unfortunately at this point I have to give up. Maybe there is someone  
who can give me further advice with Windows.
The only thing, maybe, I have in mind is to use Perl, Python etc. in  
beforehand to manipulate the data before the data are analyzed using R.


--Hans



More information about the R-help mailing list