[R] Encoding() and strsplit()
Heinz Tuechler
tuechler at gmx.at
Fri Nov 7 08:23:48 CET 2008
Dear All,
Encoding() goes beyond my understanding. See the
example. I would expect from reading the help for
Encoding() that strsplit preserves the encoding
for each resulting element, but for simple letters it gets lost.
Also it seems that an Encoding() cannot be
declared for simple letters. They remain in any
case "unknown". In paste() "latin1" seems to dominate "unknown".
What kind of characteristic of an object is the
encoding? It does not show up as attribute and
also str() does not give me any hint.
Where can I find some explanation regarding encoding?
Thanks
Heinz
### Encoding() and strsplit
u <- 'abcäöü'
Encoding(u)
[1] "latin1"
Encoding(u) <- 'latin1' # to be sure about encoding
us <- strsplit(u, '')[[1]] # split in single strings
Encoding(us)
[1] "unknown" "unknown" "unknown" "latin1" "latin1" "latin1"
Encoding(us) <- rep('latin1', length(us))
Encoding(us)
[1] "unknown" "unknown" "unknown" "latin1" "latin1" "latin1"
pus <- paste(us[1], us[5], sep='')
Encoding(pus)
[1] "latin1"
Version:
platform = i386-pc-mingw32
arch = i386
os = mingw32
system = i386, mingw32
status = Patched
major = 2
minor = 8.0
year = 2008
month = 11
day = 04
svn rev = 46830
language = R
version.string = R version 2.8.0 Patched (2008-11-04 r46830)
Windows XP (build 2600) Service Pack 2
Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252
Search Path:
.GlobalEnv, package:stats, package:graphics,
package:grDevices, package:utils,
package:datasets, package:methods, Autoloads, package:base
More information about the R-help
mailing list