[Rd] Wishlist: merge and subset to keep attributes (PR#8658)

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sun Mar 12 19:17:31 CET 2006


"Ulrike Grömping" <groemping at tfh-berlin.de> writes:

> > When importing data from SPSS, it is a nice feature of the package 
> > foreign that 
> > it allows (option use.value.labels=F) to work with the original SPSS 
> > codes while 
> > keeping the value labels as information in an attribute. Unfortunately, 
> > after 
> > merging or subsetting, these attributes disappear. 
> > The code below illustrates the problem: Variable time originally has value 
> > labels that are gone after merging or subsetting. 
> > 
> > It would be very helpful, if this could be changed. 
> > 
> > With kind regards, Ulrike 
> > ------------------------------- 
> > 
> > Ulrike - see the spss.get, label, contents, and describe functions in 
> > the Hmisc package. 
> > 
> > -- 
> > Frank E Harrell Jr   Professor and Chair           School of Medicine 
> >                       Department of Biostatistics   Vanderbilt University 
> ------- End of Original Message -------
> 
> For the sake of completeness of the thread in R-devel:
> After a longer offline exchange, Frank and I have agreed that Hmisc spss.get 
> currently does not offer more than read.spss from package foreign in terms of 
> being able to use both original codes and value labels from SPSS files (which 
> is desirable when working with large datasets from well-documented studies 
> that often require filtering rules based on original codes to be applied 
> while at the same time one does want to preseve annotation with value 
> labels). 
> 
> The solution from package foreign: The option "use.value.labels=F" prevents 
> SPSS factors (with codes and value labels) to be read into R as factors. 
> Instead, codes are read as numeric values, and the value labels are preserved 
> by assigning an attribute "value.labels" to each such variable. My issue is 
> that these attributes are lost when subsetting or merging such datasets. I 
> have no idea how difficult it is to get this changed; if it is doable without 
> too much hassle, it would be great. 

I don't think this is possible. It is happening at the level of "["
which always strips attributes. Try for instance

x <- 1:4
attr(x, "foo") <- "bar"
x
x[1]

It's a bit unclear to me why this is so, but e.g. dimension attributes
do fairly obviously need to be removed. 

It's the sort of thing where you're bound to discover just how much
code is relying on the current behaviour (quite possibly unwittingly)
if you try to change it. 

In general it is not a good idea to change language semantics for
everyone in all contexts, just because someone is unhappy with the
behaviour in one particular context...

If you want different behaviour for a limited scope, you probably need
to do it Frank's way: by defining a class and an indexing method for
it. Or copy over the attributes as needed.
 
> And by the way - not mentioned in my wish - read.spss also assigns the 
> attribute "variable.labels" to the dataset itself. This attribute is 
> currently also lost when merging or subsetting. 
> (Here, spss.get from Hmisc works differently by assigning each variable a 
> class and a label attribute which are preserved. I have the suspicion that 
> this makes spss.get substantially slower than read.spss; on the other hand, 
> it makes it easier to use these labels in annotation.)


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-devel mailing list