[R] Problem while working with SPSS data
Chuck Cleland
ccleland at optonline.net
Sun May 27 11:34:51 CEST 2007
Arun Kumar Saha wrote:
> Dear all R users,
>
> I got a strange problem while working with SPSS data :
>
> I wrote following :
>
> library(foreign)
> data.original = as.data.frame(read.spss(file="c:/Program Files/SPSS/Employee
> data.sav"))
>
> data = as.data.frame(cbind(data.original$MINORITY, data.original$EDUC,
> data.original$PREVEXP, data.original$JOBCAT, data.original$GENDER))
> colnames(data) = c('MINORITY', 'EDUC', 'PREVEXP', 'JOBCAT', 'GENDER')
>
> head( data.original)
>
> ID GENDER BDATE EDUC JOBCAT SALARY SALBEGIN JOBTIME PREVEXP
> MINORITY
> 1 1 <NA> 11654150400 15 Manager 57000 27000 98 144
> No
> 2 2 <NA> 11852956800 16 Clerical 40200 18750 98 36
> No
> 3 3 <NA> 10943337600 12 Clerical 21450 12000 98 381
> No
> 4 4 <NA> 11502518400 8 Clerical 21900 13200 98 190
> No
> 5 5 <NA> 11749363200 15 Clerical 45000 21000 98 138
> No
> 6 6 <NA> 11860819200 15 Clerical 32100 13500 98 67
> No
>
> head( data)
> V1 V2 V3 V4 V5
> 1 1 5 144 4 NA
> 2 1 6 36 2 NA
> 3 1 3 381 2 NA
> 4 1 2 190 2 NA
> 5 1 5 138 2 NA
> 6 1 5 67 2 NA
>
>
> here I got the values of variable "V2" as 5,6,3,...........etc which should
> be 15,16,12,....................
> can anyone tell me why I got that?
Your use of cbind() converted the factors to numeric.
> And my second question is that in my "data.original" why I got the values of
> "GENDER" as NA? Is there any way to get the actual values i.e. "m", and "f"?
Gender is of type "string" in the SPSS file, which seems to cause some
problem when you try to use the SPSS value labels. You might set the
use.value.labels argument to FALSE.
df <- read.spss(file="c:/Program Files/SPSS/Employee data.sav",
to.data.frame=TRUE, use.value.labels=FALSE)
summary(df)
ID GENDER BDATE EDUC
Min. : 1.0 f:216 Min. :1.093e+10 Min. : 8.00
1st Qu.:119.3 m:258 1st Qu.:1.153e+10 1st Qu.:12.00
Median :237.5 Median :1.197e+10 Median :12.00
Mean :237.5 Mean :1.180e+10 Mean :13.49
3rd Qu.:355.8 3rd Qu.:1.208e+10 3rd Qu.:15.00
Max. :474.0 Max. :1.225e+10 Max. :21.00
NA's :1.000e+00
JOBCAT SALARY SALBEGIN JOBTIME
Min. :1.000 Min. : 15750 Min. : 9000 Min. :63.00
1st Qu.:1.000 1st Qu.: 24000 1st Qu.:12488 1st Qu.:72.00
Median :1.000 Median : 28875 Median :15000 Median :81.00
Mean :1.411 Mean : 34420 Mean :17016 Mean :81.11
3rd Qu.:1.000 3rd Qu.: 36938 3rd Qu.:17490 3rd Qu.:90.00
Max. :3.000 Max. :135000 Max. :79980 Max. :98.00
PREVEXP MINORITY
Min. : 0.00 Min. :0.0000
1st Qu.: 19.25 1st Qu.:0.0000
Median : 55.00 Median :0.0000
Mean : 95.86 Mean :0.2194
3rd Qu.:138.75 3rd Qu.:0.0000
Max. :476.00 Max. :1.0000
If you want to retain the labels for all of the variables and get
around the problem with gender, you might do this:
df1 <- read.spss(file="c:/Program Files/SPSS/Employee data.sav",
to.data.frame=TRUE, use.value.labels=TRUE)
df2 <- read.spss(file="c:/Program Files/SPSS/Employee data.sav",
to.data.frame=TRUE, use.value.labels=FALSE)
new.df <- merge(df1[,!names(df1) %in% "GENDER"], df2[,c("ID","GENDER")])
head(new.df)
ID BDATE EDUC JOBCAT SALARY SALBEGIN JOBTIME PREVEXP
1 1 11654150400 15 Manager 57000 27000 98 144
2 2 11852956800 16 Clerical 40200 18750 98 36
3 3 10943337600 12 Clerical 21450 12000 98 381
4 4 11502518400 8 Clerical 21900 13200 98 190
5 5 11749363200 15 Clerical 45000 21000 98 138
6 6 11860819200 15 Clerical 32100 13500 98 67
MINORITY GENDER
1 No m
2 No m
3 No f
4 No f
5 No m
6 No m
> Thanks
> Arun
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
More information about the R-help
mailing list