[R] Converting factors back to numbers. Trouble with SPSS importdata
Robert W. Baer, Ph.D.
rbaer at atsu.edu
Sun Feb 19 23:44:24 CET 2006
Quoted directly from the FAQ (although granted I need to look this up over
and over, myself. Would that it had a easily remembered wrapper function):
7.10 How do I convert factors to numeric?
It may happen that when reading numeric data into R (usually, when reading
in a file), they come in as factors. If f is such a factor object, you can
use
as.numeric(as.character(f))
to get the numbers back. More efficient, but harder to remember, is
as.numeric(levels(f))[as.integer(f)]
In any case, do not call as.numeric() or their likes directly for the task
at hand (as as.numeric() or unclass() give the internal codes).
----- Original Message -----
From: "Paul Johnson" <pauljohn32 at gmail.com>
To: <r-help at stat.math.ethz.ch>
Sent: Sunday, February 19, 2006 2:16 PM
Subject: [R] Converting factors back to numbers. Trouble with SPSS
importdata
> I'm using Fedora Core 4, R-2.2.
>
> The basic question is: can one recover the numerical values used in
> SPSS after importing data into R with read.spss from the foreign
> library? Here's why I ask.
>
> My colleague sent an SPSS data set. I must replicate some results she
> calculated in SPSS and one problem is that the numbers used in SPSS
> for variable values are not easily recovered in R.
>
> I'm comparing 2 imported datasets, "eldat" (read.spss with No
> convert-to-factors) and
> "eldatfac" (read.spss with convert-to-factors)
>
> If I bring in the data without conversion to factors:
>
> library(foreign)
> eldat <- read.spss("18CitySCBSsorted.sav", use.value.labels=F,
> to.data.frame=T)
>
> I can see the variable HAPPY is coded 0, 1, 2, 3. Those are the
> numbers that SPSS
> uses as contrast values when it runs a regression with HAPPY.
>
> In contrast, allow R to translate the variables with a few value
> labels into factors.
>
> library(foreign)
> eldatfac <- read.spss("18CitySCBSsorted.sav",
> max.value.labels=7,to.data.frame=T)
>
> Consider the first 50 observations on the variable HAPPY
>
>> f<- eldatfac$HAPPY[1:50]
>> f
> [1] Happy Happy Very happy Happy Very happy
> [6] Very happy Happy Very happy Happy Very happy
> [11] Happy Happy Not very happy Very happy Very
> happy
> [16] Happy Happy Very happy Happy Happy
> [21] Not very happy Happy Happy Very happy Happy
> [26] Happy Happy Happy Happy Happy
> [31] Happy Happy Happy Happy Happy
> [36] Happy Very happy Very happy Happy Very
> happy
> [41] Very happy Very happy Happy Very happy Very
> happy
> [46] Happy Happy Happy Very happy Very
> happy
> 6 Levels: Not happy at all Not very happy Happy Very happy ... Refused
>
>> levels(f)
> [1] "Not happy at all" "Not very happy" "Happy" "Very happy"
> [5] "Don't know" "Refused"
>
>
> I need the numerical values back in order to have a regression like
> SPSS. Isn't this what ?factor says one ought to do? Why are these all
> missing?
>
>> as.numeric(levels(f))[f]
> [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA
> [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA
>
>
>> as.numeric(f)
> [1] 3 3 4 3 4 4 3 4 3 4 3 3 2 4 4 3 3 4 3 3 2 3 3 4 3 3 3 3 3 3 3 3 3 3 3
> 3 4 4
> [39] 3 4 4 4 3 4 4 3 3 3 4 4
>
> Comparing against the "as.numeric" output from the unconverted factor,
> I can see the levels are just one digit different.
>
>> g <- eldat$HAPPY[1:50]
>> as.numeric(g)
> [1] 2 2 3 2 3 3 2 3 2 3 2 2 1 3 3 2 2 3 2 2 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2
> 2 3 3
> [39] 2 3 3 3 2 3 3 2 2 2 3 3
>
> I'm more worried about the kinds of variables that are coded
> irregularly 1, 3, 7, 11 in the SPSS scheme.
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
More information about the R-help
mailing list