[R] Converting factors back to numbers. Trouble with SPSS import data

Thomas Lumley tlumley at u.washington.edu
Mon Feb 20 02:16:20 CET 2006


On Sun, 19 Feb 2006, Paul Johnson wrote:

> I'm using Fedora Core 4, R-2.2.
>
> The basic question is: can one recover the numerical values used in
> SPSS after importing data into R with read.spss from the foreign
> library?  Here's why I ask.
>
> My colleague sent an SPSS data set. I must replicate some results she
> calculated in SPSS and one problem is that the numbers used in SPSS
> for variable values are not easily recovered in R.
>
> I'm comparing 2 imported datasets, "eldat" (read.spss with No
> convert-to-factors) and
> "eldatfac" (read.spss with convert-to-factors)
>
> If I bring in the data without conversion to factors:
>
> library(foreign)
> eldat <- read.spss("18CitySCBSsorted.sav", use.value.labels=F,
>                        to.data.frame=T)
>
> I can see the variable HAPPY is coded 0, 1, 2, 3.  Those are the
> numbers that SPSS
> uses as contrast values when it runs a regression with HAPPY.

So, bring in the data without conversion to factors.

Factors in R are not just labels for arbitrary numeric variables. They are a special type of variable for categorical data that happen to be implemented with the numbers 1,2,3,...

If that isn't what you want, don't use factors. read.spss will still return all the labels as attributes of the returned data frame.



> In contrast,  allow R to translate the variables with a few value
> labels into factors.
>
> library(foreign)
> eldatfac <- read.spss("18CitySCBSsorted.sav",
> max.value.labels=7,to.data.frame=T)
>
> Consider the first 50 observations on the variable HAPPY
>
>> f<- eldatfac$HAPPY[1:50]
>> f
> [1] Happy          Happy          Very happy     Happy          Very happy
> [6] Very happy     Happy          Very happy     Happy          Very happy
> [11] Happy          Happy          Not very happy Very happy     Very happy
> [16] Happy          Happy          Very happy     Happy          Happy
> [21] Not very happy Happy          Happy          Very happy     Happy
> [26] Happy          Happy          Happy          Happy          Happy
> [31] Happy          Happy          Happy          Happy          Happy
> [36] Happy          Very happy     Very happy     Happy          Very happy
> [41] Very happy     Very happy     Happy          Very happy     Very happy
> [46] Happy          Happy          Happy          Very happy     Very happy
> 6 Levels: Not happy at all Not very happy Happy Very happy ... Refused
>
>> levels(f)
> [1] "Not happy at all" "Not very happy"   "Happy"            "Very happy"
> [5] "Don't know"       "Refused"
>
>
> I need the numerical values back in order to have a regression like
> SPSS.  Isn't this what ?factor says one ought to do? Why are these all
> missing?
>
>> as.numeric(levels(f))[f]
> [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NANA

No, this is not what ?factor says you should do.  This is what you do if your levels are numbers (in character form) and you want those numbers. "Happy" is not a number.


>> as.numeric(f)
> [1] 3 3 4 3 4 4 3 4 3 4 3 3 2 4 4 3 3 4 3 3 2 3 3 4 3 3 3 3 3 3 3 3 3 3 3 3 4 4
> [39] 3 4 4 4 3 4 4 3 3 3 4 4
>
> Comparing against the "as.numeric" output from the unconverted factor,
> I can see the levels are just one digit different.

Yes, because SPSS used the codes 0,1,2,3 and R uses 1,2,3,4.  You could just subtract 1 if you want the numbers to be smaller by 1.


>> g <- eldat$HAPPY[1:50]
>> as.numeric(g)
> [1] 2 2 3 2 3 3 2 3 2 3 2 2 1 3 3 2 2 3 2 2 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 3 3
> [39] 2 3 3 3 2 3 3 2 2 2 3 3
>
> I'm more worried about the kinds of variables that are coded
> irregularly 1, 3, 7, 11 in the SPSS scheme.
>

If you want to keep the numeric values, don't change them to factors. That's why there is an option.


     -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list