[R] apply lm function to dataset split by two variables
Dennis Murphy
djmuser at gmail.com
Wed Sep 28 12:56:32 CEST 2011
Hi:
Here's one way to do it with the plyr package:
dd <- read.table(textConnection("
year sps cm w
2009 50 16 22
2009 50 17 42
2009 50 18 45
2009 51 15 45
2009 51 16 53
2009 51 17 73
2010 50 15 22
2010 50 16 41
2010 50 16 21
2010 50 17 36
2010 51 15 43
2010 51 16 67
2010 51 17 79"), header = TRUE)
closeAllConnections()
library('plyr')
# Input a data frame, output a list of lm objects
modlist <- dlply(dd, .(sps, year), function(d) lm(w ~ cm, data = d))
# For use in plyr's ldply() function, the utility function should
# return a data frame. We save some effort in simple linear regression
# by noting that the two-sided p-value of the t-test of zero slope is the
# same as that of the overall F test:
extractfun <- function(m) {
cf <- coef(m)
tinfo <- summary(m)$coefficients[2, c(2, 4)]
r2 <- summary(m)$r.squared
data.frame(intercept = cf[1], slope = cf[2], n = length(resid(m)),
slope.se = tinfo[1], pval = tinfo[2], Rsq = r2)
}
# Take a list (of models) as input and output a data frame:
ldply(modlist, extractfun)
sps year intercept slope n slope.se pval Rsq
1 50 2009 -159.1667 11.5 3 4.907477 0.2567749 0.8459488
2 50 2010 -82.0000 7.0 4 7.141428 0.4303481 0.3245033
3 51 2009 -167.0000 14.0 3 3.464102 0.1544210 0.9423077
4 51 2010 -225.0000 18.0 3 3.464102 0.1210377 0.9642857
HTH,
Dennis
On Wed, Sep 28, 2011 at 1:41 AM, Elena Guijarro
<elena.guijarro at vi.ieo.es> wrote:
>
> Dear all,
>
> I am not fluent in R and am struggling to 1) apply a lm to a weight-size
> dataset, thus the model has to run separately for each species, each
> year; 2) extract coefs, r-squared, n, etc. The data look like this:
>
> year sps cm w
> 2009 50 16 22
> 2009 50 17 42
> 2009 50 18 45
> 2009 51 15 45
> 2009 51 16 53
> 2009 51 17 73
> 2010 50 15 22
> 2010 50 16 41
> 2010 50 16 21
> 2010 50 17 36
> 2010 51 15 43
> 2010 51 16 67
> 2010 51 17 79
>
>
>
> The following script works for data from a single year, but I don't find
> a way to subset the data by sps AND year and get the function running:
>
> f <- function(data) lm(log(w) ~ log(cm+0.5), data = data)
> v <- lapply(split(data, data$sps), f)
>
> and then I can extract the data with this script from Peter Solymos
> (although I do not get the number of points used in the analysis):
>
> myFun <-
> function(lm)
> {
> out <- c(lm$coefficients[1],
> lm$coefficients[2],
> length(lm$run1$model$y),
> summary(lm)$coefficients[2,2],
> pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
> summary(lm)$fstatistic[3], lower.tail = FALSE),
> summary(lm)$r.squared)
> names(out) <- c("intercept","slope","n","slope.SE","p.value","r.squared")
> return(out)}
>
> results <- list()
> for (i in 1:length(v)) results[[names(v)[i]]] <- myFun(v[[i]])
> as.data.frame(results)
>
> I have checked the plyr package, but the example that fits my data best
> uses a for loop and I would like to avoid these. I have also tried the
> following (among many other options) without results:
>
> v<-tapply(data$w,list(data$cm,data$year),f)
>
> Error in is.function(FUN) : 'FUN' is missing
>
> Any ideas?
>
> Thanks for your help,
>
> Elena
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list