[R] Losing factor levels when moving variables from one context to another
Marc Schwartz
marc_schwartz at comcast.net
Thu Feb 1 18:51:10 CET 2007
On Thu, 2007-02-01 at 12:13 -0500, Michael Rennie wrote:
> Hi, there
>
> I'm currently trying to figure out how to keep my "factor" levels for a
> variable when moving it from one data frame or matrix to another.
>
> Example below:
>
> vec1<-(rep("10",5))
> vec2<-(rep("30",5))
> vec3<-(rep("80",5))
> vecs<-c(vec1, vec2, vec3)
>
> resp<-rnorm(2,15)
>
> dat<-as.data.frame(cbind(resp, vecs))
> dat$vecs<-factor(dat$vecs)
> dat
>
> R returns:
> resp vecs
> 1 1.57606068767956 10
> 2 2.30271782269308 10
> 3 2.39874788444542 10
> 4 0.963987738423353 10
> 5 2.03620782454740 10
> 6 -0.0706713324725649 30
> 7 1.49001721222926 30
> 8 2.00587718501980 30
> 9 0.450576585429981 30
> 10 2.87120375367357 30
> 11 2.25575058079324 80
> 12 2.03471288724508 80
> 13 2.67432066972984 80
> 14 1.74102136279177 80
> 15 2.29827581276955 80
>
> and now:
>
> newvar<-(rnorm(15,4))
> newdat<-as.data.frame(cbind(newvar, dat$vecs))
> newdat
>
> R returns:
>
> newvar V2
> 1 4.300788 1
> 2 5.295951 1
> 3 5.099849 1
> 4 3.211045 1
> 5 3.703554 1
> 6 3.693826 2
> 7 5.314679 2
> 8 4.222270 2
> 9 3.534515 2
> 10 4.037401 2
> 11 4.476808 3
> 12 4.842449 3
> 13 3.109677 3
> 14 4.752961 3
> 15 4.445216 3
> >
>
> I seem to have lost everything I once has associated with "vecs", and it's
> turned my actual values into arbitrary groupings.
>
> I assume this has something to do with the behaviour of factors? Does
> anyone have any suggestions on how to get my original levels, etc., back?
>
> Cheers,
>
> Mike
Mike,
The problem (specific to your example) is that you are using
as.data.frame() and cbind(), which will first coerce the columns to a
common data type, create a matrix and then coerce the matrix to a
dataframe.
Thus, in the second case, your factor dat$vecs is first being coerced to
its numeric equivalent values, rather then being retained as a factor,
since a matrix can contain only one data type and the first column is
numeric.
Try this instead:
vec1<-(rep("10", 5))
vec2<-(rep("30", 5))
vec3<-(rep("80", 5))
vecs<-c(vec1, vec2, vec3)
set.seed(1)
resp<-rnorm(15, 2)
dat <- data.frame(resp, vecs)
> str(dat)
'data.frame': 15 obs. of 2 variables:
$ resp: num 1.37 2.18 1.16 3.60 2.33 ...
$ vecs: Factor w/ 3 levels "10","30","80": 1 1 1 1 1 2 2 2 2 2 ..
set.seed(2)
newvar <- rnorm(15, 4)
newdat <- data.frame(newvar, dat$vecs)
> str(newdat)
'data.frame': 15 obs. of 2 variables:
$ newvar : num 3.10 4.18 5.59 2.87 3.92 ...
$ dat.vecs: Factor w/ 3 levels "10","30","80": 1 1 1 1 1 2 2 2 2 2 ...
> all(levels(newdat$dat.vecs) == levels(dat$vecs))
[1] TRUE
BTW, there may very well be times when you are combining two factors
together and need to ensure that the factor levels either are
intentionally different or need to "relevel" the combined factors into
common levels. See the Warning and other information in ?factor. This
would be critical, for example, if you are combining data sets to then
run modeling functions on the combined data sets.
HTH,
Marc Schwartz
More information about the R-help
mailing list