[Rd] read.spss issues

Thomas Lumley tlumley at uw.edu
Wed Feb 15 21:28:32 CET 2012


On Wed, Feb 15, 2012 at 7:05 PM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu> wrote:

> The second problem is that the spss dataformat allows to specify
> 'duplicate labels', whereas this is not allowed for factors. read.spss
> does not deal with this and creates a bad factor
>
> x <- read.spss("http://www.stat.ucla.edu/~jeroen/spss/duplicate_labels.sav",
> use.value.labels=T);
> levels(x$opinion);
>
> which causes issues downstream. I am not sure if this is an issue in
> read.spss() or as.factor(), but I guess it might be wise to try to
> detect duplicate levels and assign them all with one and the same
> integer value when converting to a factor.

I think this one would be better dealt with by giving an error.

SPSS value labels are just labels, so they don't map very well onto R
factors, which are enumerated types.  Rather than force them and lose
data, I would prefer to make the user decide what to do.

    -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland



More information about the R-devel mailing list