[R] Problem while working with SPSS data

Sun May 27 11:34:51 CEST 2007

Arun Kumar Saha wrote:
> Dear all R users,
> 
> I got a strange problem while working with SPSS data :
> 
> I wrote following :
> 
> library(foreign)
> data.original = as.data.frame(read.spss(file="c:/Program Files/SPSS/Employee
> data.sav"))
> 
> data = as.data.frame(cbind(data.original$MINORITY, data.original$EDUC,
> data.original$PREVEXP, data.original$JOBCAT, data.original$GENDER))
> colnames(data) = c('MINORITY', 'EDUC', 'PREVEXP', 'JOBCAT', 'GENDER')
> 
> head( data.original)
> 
>   ID GENDER       BDATE EDUC   JOBCAT SALARY SALBEGIN JOBTIME PREVEXP
> MINORITY
> 1  1   <NA> 11654150400   15  Manager  57000    27000      98     144
> No
> 2  2   <NA> 11852956800   16 Clerical  40200    18750      98      36
> No
> 3  3   <NA> 10943337600   12 Clerical  21450    12000      98     381
> No
> 4  4   <NA> 11502518400    8 Clerical  21900    13200      98     190
> No
> 5  5   <NA> 11749363200   15 Clerical  45000    21000      98     138
> No
> 6  6   <NA> 11860819200   15 Clerical  32100    13500      98      67
> No
> 
>  head( data)
>   V1 V2  V3 V4 V5
> 1  1  5 144  4 NA
> 2  1  6  36  2 NA
> 3  1  3 381  2 NA
> 4  1  2 190  2 NA
> 5  1  5 138  2 NA
> 6  1  5  67  2 NA
> 
> 
> here I got the values of variable "V2" as 5,6,3,...........etc which should
> be 15,16,12,....................

> can anyone tell me why I got that?

  Your use of cbind() converted the factors to numeric.

> And my second question is that in my "data.original" why I got the values of
> "GENDER" as NA? Is there any way to get the actual values i.e. "m", and "f"?

  Gender is of type "string" in the SPSS file, which seems to cause some
problem when you try to use the SPSS value labels.  You might set the
use.value.labels argument to FALSE.

df <- read.spss(file="c:/Program Files/SPSS/Employee data.sav",
                to.data.frame=TRUE, use.value.labels=FALSE)

summary(df)
       ID        GENDER      BDATE                EDUC
 Min.   :  1.0   f:216   Min.   :1.093e+10   Min.   : 8.00
 1st Qu.:119.3   m:258   1st Qu.:1.153e+10   1st Qu.:12.00
 Median :237.5           Median :1.197e+10   Median :12.00
 Mean   :237.5           Mean   :1.180e+10   Mean   :13.49
 3rd Qu.:355.8           3rd Qu.:1.208e+10   3rd Qu.:15.00
 Max.   :474.0           Max.   :1.225e+10   Max.   :21.00
                         NA's   :1.000e+00

     JOBCAT          SALARY          SALBEGIN        JOBTIME
 Min.   :1.000   Min.   : 15750   Min.   : 9000   Min.   :63.00
 1st Qu.:1.000   1st Qu.: 24000   1st Qu.:12488   1st Qu.:72.00
 Median :1.000   Median : 28875   Median :15000   Median :81.00
 Mean   :1.411   Mean   : 34420   Mean   :17016   Mean   :81.11
 3rd Qu.:1.000   3rd Qu.: 36938   3rd Qu.:17490   3rd Qu.:90.00
 Max.   :3.000   Max.   :135000   Max.   :79980   Max.   :98.00

    PREVEXP          MINORITY
 Min.   :  0.00   Min.   :0.0000
 1st Qu.: 19.25   1st Qu.:0.0000
 Median : 55.00   Median :0.0000
 Mean   : 95.86   Mean   :0.2194
 3rd Qu.:138.75   3rd Qu.:0.0000
 Max.   :476.00   Max.   :1.0000

  If you want to retain the labels for all of the variables and get
around the problem with gender, you might do this:

df1 <- read.spss(file="c:/Program Files/SPSS/Employee data.sav",
to.data.frame=TRUE, use.value.labels=TRUE)

df2 <- read.spss(file="c:/Program Files/SPSS/Employee data.sav",
to.data.frame=TRUE, use.value.labels=FALSE)

new.df <- merge(df1[,!names(df1) %in% "GENDER"], df2[,c("ID","GENDER")])

head(new.df)
  ID       BDATE EDUC   JOBCAT SALARY SALBEGIN JOBTIME PREVEXP
1  1 11654150400   15  Manager  57000    27000      98     144
2  2 11852956800   16 Clerical  40200    18750      98      36
3  3 10943337600   12 Clerical  21450    12000      98     381
4  4 11502518400    8 Clerical  21900    13200      98     190
5  5 11749363200   15 Clerical  45000    21000      98     138
6  6 11860819200   15 Clerical  32100    13500      98      67
  MINORITY GENDER
1       No      m
2       No      m
3       No      f
4       No      f
5       No      m
6       No      m

> Thanks
> Arun

> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894