[R] speeding up regressions using ddply
Alison Macalady
ali at kmhome.org
Wed Sep 22 13:05:12 CEST 2010
Hi,
I have a data set that I'd like to run logistic regressions on, using
ddply to speed up the computation of many models with different
combinations of variables. I would like to run regressions on every
unique two-variable combination in a portion of my data set, but I
can't quite figure out how to do using ddply. The data set looks like
this, with "status" as the binary dependent variable and V1:V8 as
potential independent variables in the logistic regression:
m <- matrix(rnorm(288), nrow = 36)
colnames(m) <- paste('V', 1:8, sep = '')
x <- data.frame( status = factor(rep(rep(c('D','L'), each = 6), 3)),
as.data.frame(m))
I used melt to put my data frame into a more workable format
require(reshape)
xm <- melt(x, id = 'status')
Here is the basic shape of the function I'd like to apply to every
combination of variables in the dataset:
h<- function(df)
{
attach(df)
log.glm <- (glm(status ~ value1+ value2 , family=binomial(link=logit),
na.action=na.omit)) #What I can't figure out is how to specify 2
different variables (I've put value1 and value2 as placeholders) from
the xm to include in the model
glm.summary<-summary(log.glm)
aic <- extractAIC(log.glm)
coef <- coef(glm.summary)
list(Est1=coef[1,2], Est2=coef[3,2], AIC=aic[2]) #or whatever other
output here
}
And then I'd like to use ddply to speed up the computations.
require(pplyr)
output<-dddply(xm, .(variable), as.data.frame.function(h))
output
I can easily do this using ddply when I only want to use 1 variable in
the model, but can't figure out how to do it with two variables.
Many thanks for any hints!
Ali
--------------------
Alison Macalady
Ph.D. Candidate
University of Arizona
School of Geography and Development
& Laboratory of Tree Ring Research
More information about the R-help
mailing list