[Rd] Wishlist: merge and subset to keep attributes (PR#8658)

Ulrike Grömping groemping at tfh-berlin.de
Sun Mar 12 17:51:13 CET 2006


> When importing data from SPSS, it is a nice feature of the package 
> foreign that 
> it allows (option use.value.labels=F) to work with the original SPSS 
> codes while 
> keeping the value labels as information in an attribute. Unfortunately, 
> after 
> merging or subsetting, these attributes disappear. 
> The code below illustrates the problem: Variable time originally has value 
> labels that are gone after merging or subsetting. 
> 
> It would be very helpful, if this could be changed. 
> 
> With kind regards, Ulrike 
> ------------------------------- 
> 
> Ulrike - see the spss.get, label, contents, and describe functions in 
> the Hmisc package. 
> 
> -- 
> Frank E Harrell Jr   Professor and Chair           School of Medicine 
>                       Department of Biostatistics   Vanderbilt University 
------- End of Original Message -------

For the sake of completeness of the thread in R-devel:
After a longer offline exchange, Frank and I have agreed that Hmisc spss.get 
currently does not offer more than read.spss from package foreign in terms of 
being able to use both original codes and value labels from SPSS files (which 
is desirable when working with large datasets from well-documented studies 
that often require filtering rules based on original codes to be applied 
while at the same time one does want to preseve annotation with value 
labels). 

The solution from package foreign: The option "use.value.labels=F" prevents 
SPSS factors (with codes and value labels) to be read into R as factors. 
Instead, codes are read as numeric values, and the value labels are preserved 
by assigning an attribute "value.labels" to each such variable. My issue is 
that these attributes are lost when subsetting or merging such datasets. I 
have no idea how difficult it is to get this changed; if it is doable without 
too much hassle, it would be great. 

And by the way - not mentioned in my wish - read.spss also assigns the 
attribute "variable.labels" to the dataset itself. This attribute is 
currently also lost when merging or subsetting. 
(Here, spss.get from Hmisc works differently by assigning each variable a 
class and a label attribute which are preserved. I have the suspicion that 
this makes spss.get substantially slower than read.spss; on the other hand, 
it makes it easier to use these labels in annotation.)

With kind regards, Ulrike



More information about the R-devel mailing list