[R] Keep value lables with data frame manipulation

Marc Schwartz (via MN) mschwartz at mn.rr.com
Wed Jul 12 20:14:22 CEST 2006

On Wed, 2006-07-12 at 17:41 +0100, Jol, Arne wrote:
> Dear R,
> I import data from spss into a R data.frame. On this rawdata I do some
> data processing (selection of observations, normalization, recoding of
> variables etc..). The result is stored in a new data.frame, however, in
> this new data.frame the value labels are lost.
> Example of what I do in code:
> # read raw data from spss
> rawdata <- read.spss("./data/T50937.SAV",
> 	use.value.labels=FALSE,to.data.frame=TRUE)
> # select the observations that we need
> diarydata <- rawdata[rawdata$D22==2 | rawdata$D22==3 | rawdata$D22==17 |
> rawdata$D22==18 | rawdata$D22==20 | rawdata$D22==22 |
>  			rawdata$D22==24 | rawdata$D22==33,]
> The result is that rawdata$D22 has value labels and that diarydata$D22
> is numeric without value labels.
> Question: How can I prevent this from happening?
> Thanks in advance!
> Groeten,
> Arne

Two things:

1. With respect to your subsetting, your lengthy code can be replaced
with the following:

  diarydata <- subset(rawdata, D22 %in% c(2, 3, 17, 18, 20, 22, 24, 33))

See ?subset and ?"%in%" for more information.

2. With respect to keeping the label related attributes, the
'value.labels' attribute and the 'variable.labels' attribute will not by
default survive the use of "[".data.frame in R (see ?Extract
and ?"[.data.frame").

On the other hand, based upon my review of ?read.spss, the SPSS value
labels should be converted to the factor levels of the respective
columns when 'use.value.labels = TRUE' and these would survive a

If you want to consider a solution to the attribute subsetting issue,
you might want to review the following post by Gabor Grothendieck in
May, which provides a possible solution:


and this post by me, for an explanation of what is happening in Gabor's



Marc Schwartz

More information about the R-help mailing list