[R] Keep value lables with data frame manipulation
Marc Schwartz (via MN)
mschwartz at mn.rr.com
Wed Jul 12 20:14:22 CEST 2006
On Wed, 2006-07-12 at 17:41 +0100, Jol, Arne wrote:
> Dear R,
>
> I import data from spss into a R data.frame. On this rawdata I do some
> data processing (selection of observations, normalization, recoding of
> variables etc..). The result is stored in a new data.frame, however, in
> this new data.frame the value labels are lost.
>
> Example of what I do in code:
>
> # read raw data from spss
> rawdata <- read.spss("./data/T50937.SAV",
> use.value.labels=FALSE,to.data.frame=TRUE)
>
> # select the observations that we need
> diarydata <- rawdata[rawdata$D22==2 | rawdata$D22==3 | rawdata$D22==17 |
> rawdata$D22==18 | rawdata$D22==20 | rawdata$D22==22 |
> rawdata$D22==24 | rawdata$D22==33,]
>
> The result is that rawdata$D22 has value labels and that diarydata$D22
> is numeric without value labels.
>
> Question: How can I prevent this from happening?
>
> Thanks in advance!
> Groeten,
> Arne
Two things:
1. With respect to your subsetting, your lengthy code can be replaced
with the following:
diarydata <- subset(rawdata, D22 %in% c(2, 3, 17, 18, 20, 22, 24, 33))
See ?subset and ?"%in%" for more information.
2. With respect to keeping the label related attributes, the
'value.labels' attribute and the 'variable.labels' attribute will not by
default survive the use of "[".data.frame in R (see ?Extract
and ?"[.data.frame").
On the other hand, based upon my review of ?read.spss, the SPSS value
labels should be converted to the factor levels of the respective
columns when 'use.value.labels = TRUE' and these would survive a
subsetting.
If you want to consider a solution to the attribute subsetting issue,
you might want to review the following post by Gabor Grothendieck in
May, which provides a possible solution:
https://stat.ethz.ch/pipermail/r-help/2006-May/106308.html
and this post by me, for an explanation of what is happening in Gabor's
solution:
https://stat.ethz.ch/pipermail/r-help/2006-May/106351.html
HTH,
Marc Schwartz
More information about the R-help
mailing list