[R] Strange output daply with empty strata
Jan van der Laan
rhelp at eoos.dds.nl
Thu Sep 9 11:43:02 CEST 2010
Dear list,
I get some strange results with daply from the plyr package. In the
example below, the average age per municipality for employed en
unemployed is calculated. If I do this using tapply (see code below) I
get the following result:
no yes
A NA 36.94931
B 51.22505 34.24887
C 48.05759 51.00198
If I do this using daply:
municipality no yes
A 36.94931 48.05759
B 51.22505 51.00198
C 34.24887 NA
daply generates the same numbers. However, these are not in the
correct cells. For example, in municipality A everybody is employed.
Therefore, the NA should be in the cell for unemployed in municipality
A.
Am I using daply incorrectly or is there indeed something wrong with
the output of daply?
Regards,
Jan
I am using version 1.1 of the plyr-package.
# Generate some test data
data.test <- data.frame(
municipality=rep(LETTERS[1:3], each=10),
employed=sample(c("yes", "no"), 30, replace=TRUE),
age=runif(30,20,70))
# Make sure everybody is employed in municipality A
data.test$employed[data.test$municipality == "A"] <- "yes"
# Compare the output of tapply:
tapply(data.test$age, list(data.test$municipality, data.test$employed),
mean)
# to that of daply:
daply(data.test, .(municipality, employed), function(d){mean(d$age)} )
# results of ddply are the samen as tapply
ddply(data.test, .(municipality, employed), function(d){mean(d$age)} )
More information about the R-help
mailing list