[Rd] Wishlist: merge and subset to keep attributes (PR#8658)
Ulrike Grömping
groemping at tfh-berlin.de
Sun Mar 12 17:51:13 CET 2006
> When importing data from SPSS, it is a nice feature of the package
> foreign that
> it allows (option use.value.labels=F) to work with the original SPSS
> codes while
> keeping the value labels as information in an attribute. Unfortunately,
> after
> merging or subsetting, these attributes disappear.
> The code below illustrates the problem: Variable time originally has value
> labels that are gone after merging or subsetting.
>
> It would be very helpful, if this could be changed.
>
> With kind regards, Ulrike
> -------------------------------
>
> Ulrike - see the spss.get, label, contents, and describe functions in
> the Hmisc package.
>
> --
> Frank E Harrell Jr Professor and Chair School of Medicine
> Department of Biostatistics Vanderbilt University
------- End of Original Message -------
For the sake of completeness of the thread in R-devel:
After a longer offline exchange, Frank and I have agreed that Hmisc spss.get
currently does not offer more than read.spss from package foreign in terms of
being able to use both original codes and value labels from SPSS files (which
is desirable when working with large datasets from well-documented studies
that often require filtering rules based on original codes to be applied
while at the same time one does want to preseve annotation with value
labels).
The solution from package foreign: The option "use.value.labels=F" prevents
SPSS factors (with codes and value labels) to be read into R as factors.
Instead, codes are read as numeric values, and the value labels are preserved
by assigning an attribute "value.labels" to each such variable. My issue is
that these attributes are lost when subsetting or merging such datasets. I
have no idea how difficult it is to get this changed; if it is doable without
too much hassle, it would be great.
And by the way - not mentioned in my wish - read.spss also assigns the
attribute "variable.labels" to the dataset itself. This attribute is
currently also lost when merging or subsetting.
(Here, spss.get from Hmisc works differently by assigning each variable a
class and a label attribute which are preserved. I have the suspicion that
this makes spss.get substantially slower than read.spss; on the other hand,
it makes it easier to use these labels in annotation.)
With kind regards, Ulrike
More information about the R-devel
mailing list