[R-sig-eco] Regression (lm) using parametric bootstrap

Wed Aug 31 10:42:01 CEST 2011

Hello,

First of all, thank you for all your input! I think I managed now to build
a parametric bootstrap-model using the lm-function as statistic and the
runif-function for the random factor, which looks like:

statistic <- function(data)
{
	model <- lm(log(data$Y)~data$X1+data$X2)
	out <- c(model$coefficients[1],	#Intercept
	model$coefficients[2],				#slope for first independent
	model$coefficients[3],				#slope for second independent
	summary(model)$coefficients[2,4],	#single p-value for first independent
	summary(model)$coefficients[3,4],	#single p-value for second independent
	pf(summary(model)$fstatistic[1], summary(model)$fstatistic[2],
	summary(model)$fstatistic[3], lower.tail = FALSE),		#overall p-value
	summary(model)$r.squared)								#r-squared
	out
}

X1.gen <- function(data,mle)
{
	out <- data
	out$X1 <- runif(length(mle$X1a), mle$X1a, mle$X1b)
	out
}

regression.boot <- boot(data, statistic, R=10000, sim="parametric", ran.gen=X1.gen, mle=list(X1a=data$X1a,X1b=data$X1b))

colnames(regression.boot$t) <- c(names(coefficients(model)[1]), names(coefficients(model)[2]), names(coefficients(model)[3]),"p-X1","p-X2","p.value","r.squared")		#names

summary(regression.boot$t)

I compared the results with my "manual bootstrap approach" (looped repeated regressions), and I get similar results! So I am quite happy that it works so well!

Now some questions came up:
1) Is that what I am doing reasonable? I get all the summaries for the 
regression coefficients for the simulated bootstrap samples. And the mean
or median can now be my "real" coefficients.

2) At the moment I am getting the regression coefficients (and names) via the index of the lm-coefficients summary. Is there any other way to get all the coefficients (like here: Intercept, slope X1,...overall p, R-squared)? When I am adding an additional regressor (another X3) to my model I have to adapt the coefficient function and the extraction of the names as well. Is there a more dynamic way that this vector is automatically created depending on the number of regressor (X's)? I hope you understand what I mean.

3) If I want to check if one model fits better then another, I can compare the R-squared of different models with a t-test? Does that makes sense? If it is the case, I think I can also compare the slopes etc. with a similar approach.

4)Can I also get confidence intervals? The boot.ci is called "Non-parametric" Bootstrap CI's, so probably I can't use them for my parametric bootstrap model!?

5)What is the best way to check for normal distribution of my regression-errors (residuals)? I checked them when I was running my regression without bootstrap, and there it became clear that I've to e.g. log-transform the response. Is there any difference in the case of my parametric bootstrap regression?

6) Any tips what I can analyze additionally besides these coefficients? Of course I can know look at the distributions of the single coefficients visually (histogramms etc.). Is there anything special with parametric bootstrap I have to consider?

Thank you!

Johannes

--