[R-sig-eco] Regression (lm) using parametric bootstrap
Johannes Radinger
JRadinger at gmx.at
Wed Aug 31 10:42:01 CEST 2011
Hello,
First of all, thank you for all your input! I think I managed now to build
a parametric bootstrap-model using the lm-function as statistic and the
runif-function for the random factor, which looks like:
statistic <- function(data)
{
model <- lm(log(data$Y)~data$X1+data$X2)
out <- c(model$coefficients[1], #Intercept
model$coefficients[2], #slope for first independent
model$coefficients[3], #slope for second independent
summary(model)$coefficients[2,4], #single p-value for first independent
summary(model)$coefficients[3,4], #single p-value for second independent
pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2],
summary(model)$fstatistic[3], lower.tail = FALSE), #overall p-value
summary(model)$r.squared) #r-squared
out
}
X1.gen <- function(data,mle)
{
out <- data
out$X1 <- runif(length(mle$X1a), mle$X1a, mle$X1b)
out
}
regression.boot <- boot(data, statistic, R=10000, sim="parametric", ran.gen=X1.gen, mle=list(X1a=data$X1a,X1b=data$X1b))
colnames(regression.boot$t) <- c(names(coefficients(model)[1]), names(coefficients(model)[2]), names(coefficients(model)[3]),"p-X1","p-X2","p.value","r.squared") #names
summary(regression.boot$t)
I compared the results with my "manual bootstrap approach" (looped repeated regressions), and I get similar results! So I am quite happy that it works so well!
Now some questions came up:
1) Is that what I am doing reasonable? I get all the summaries for the
regression coefficients for the simulated bootstrap samples. And the mean
or median can now be my "real" coefficients.
2) At the moment I am getting the regression coefficients (and names) via the index of the lm-coefficients summary. Is there any other way to get all the coefficients (like here: Intercept, slope X1,...overall p, R-squared)? When I am adding an additional regressor (another X3) to my model I have to adapt the coefficient function and the extraction of the names as well. Is there a more dynamic way that this vector is automatically created depending on the number of regressor (X's)? I hope you understand what I mean.
3) If I want to check if one model fits better then another, I can compare the R-squared of different models with a t-test? Does that makes sense? If it is the case, I think I can also compare the slopes etc. with a similar approach.
4)Can I also get confidence intervals? The boot.ci is called "Non-parametric" Bootstrap CI's, so probably I can't use them for my parametric bootstrap model!?
5)What is the best way to check for normal distribution of my regression-errors (residuals)? I checked them when I was running my regression without bootstrap, and there it became clear that I've to e.g. log-transform the response. Is there any difference in the case of my parametric bootstrap regression?
6) Any tips what I can analyze additionally besides these coefficients? Of course I can know look at the distributions of the single coefficients visually (histogramms etc.). Is there anything special with parametric bootstrap I have to consider?
Thank you!
Johannes
--
More information about the R-sig-ecology
mailing list