[R] Issues with factors with duplicate (empty) levels
Frederik Elwert
frederik.elwert at rub.de
Wed Aug 26 11:55:50 CEST 2009
Hello!
I imported a DJI survey[1] from an SPSS file. When looking at some of
the variables, I noticed problems with the `table` function and similar.
It seems to be caused by duplicate levels which are generated from the
value labels. Not all values have labels, so those who don’t get an
empty string as the level, which leads to duplicates.
I hope the code and output below illustrates the problem. Is it possible
to prevent this? I’d still like to use the labels, so using numeric
vectors instead of factors is not the best solution.
Regards,
Frederik
> library(foreign)
> Data <- read.spss("js2003_16_29_db.sav", to.data.frame=TRUE,
reencode="latin1")
> table(Data$J203_A)
überhaupt nicht wichtig
35 2256 0
0 0 0
sehr wichtig Mehrfachnennung
4660 0
> table(as.numeric(Data$J203_A))
1 2 3 4 5 6 7
35 39 84 227 626 1280 4660
> is.factor(Data$J203_A)
[1] TRUE
> levels(Data$J203_A)
[1] "überhaupt nicht wichtig" " "
[3] " " " "
[5] " " " "
[7] "sehr wichtig" "Mehrfachnennung"
[1] http://213.133.108.158/surveys/index.php?m=msw,0&sID=54
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Dies ist ein digital signierter Nachrichtenteil
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090826/64f3d71d/attachment-0002.bin>
More information about the R-help
mailing list