[R] Issues with factors with duplicate (empty) levels
Frederik Elwert
frederik.elwert at rub.de
Thu Aug 27 14:44:36 CEST 2009
Hello again,
Just for your information, I think I found a way to work around the
problem described below. I don’t know if it’s the most elegant way, but
it seems to work.
Am Mittwoch, den 26.08.2009, 11:55 +0200 schrieb Frederik Elwert:
> Hello!
>
> I imported a DJI survey[1] from an SPSS file. When looking at some of
> the variables, I noticed problems with the `table` function and similar.
> It seems to be caused by duplicate levels which are generated from the
> value labels. Not all values have labels, so those who don’t get an
> empty string as the level, which leads to duplicates.
>
> I hope the code and output below illustrates the problem. Is it possible
> to prevent this? I’d still like to use the labels, so using numeric
> vectors instead of factors is not the best solution.
>
> Regards,
> Frederik
>
>
> > library(foreign)
> > Data <- read.spss("js2003_16_29_db.sav", to.data.frame=TRUE,
> reencode="latin1")
> > table(Data$J203_A)
>
> überhaupt nicht wichtig
> 35 2256 0
>
> 0 0 0
> sehr wichtig Mehrfachnennung
> 4660 0
> > table(as.numeric(Data$J203_A))
>
> 1 2 3 4 5 6 7
> 35 39 84 227 626 1280 4660
> > is.factor(Data$J203_A)
> [1] TRUE
> > levels(Data$J203_A)
> [1] "überhaupt nicht wichtig" " "
> [3] " " " "
> [5] " " " "
> [7] "sehr wichtig" "Mehrfachnennung"
for (i in 1:ncol(Data)){
if (is.factor(Data[,i])){
lvl <- levels(JS2003[,i])
if (" " %in% lvl){
empty <- lvl == " "
lvl[empty] <- (1:length(lvl))[empty]
levels(Data[,i]) <- lvl
}
}
}
> table(Data$J203_A)
überhaupt nicht wichtig 2 3
35 39 84
4 5 6
227 626 1280
sehr wichtig Mehrfachnennung
4660 0
More information about the R-help
mailing list