[R-sig-eco] "average" regression - bootstrap?

Tue Aug 30 09:58:00 CEST 2011

Hi,

As you said, my approach at the moment is to sample the finite population.
I think if my model is extendend like: model <- lm(Y~X1()+X2), the only variation in the resulting distributions (after the 1000 simulations) is caused by the variabiltiy of X1 which is the runif function. 
To overcome this and to get a picture of the "real world" as you said,
bootstrapping can be a good option. Especially to compare results, as the bootstrap results are objects (I think). So I can then compare them (with anovas?) and get confidence intervalls etc.

So I am interested in using the runif-function with the bootstrap approach, but I don't know how to do that yet. Maybe you can give me an examplecode for my case. Pedro, you mentioned two different ways of bootstrapping, but they aren't clear to me yet, what are the differences in the results? Can both be applied in my case? 

So what I have so far in my case (my code which I want to extend with bootstrap):

X2 <- function()runif(length(X1a), X1a, X1b)
model <- lm(Y~X1())

example1 <- list()

n=1000
for(i in 1:n) {
	#define regression model
	model <- lm(Y~X1())
	example1[[paste("run",i,sep="")]] <- model
	}

info_model_runs <- function(lm){
	out <- c(lm$coefficients[1],	
	lm$coefficients[2],				
	pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2],
	summary(lm)$fstatistic[3], lower.tail = FALSE),	
	summary(lm)$r.squared)	
	names(out) <- c(names(coefficients(model)[1]), names(coefficients(model)[2]),"p.value","r.squared")
	return(out)
	}

example1.results.model_runs <- list()

for (i in 1:length(example1)) SD1.results.model_runs[[names(example1)[i]]] <- info_model_runs(example1[[i]])

example1.results.dataframe <- as.data.frame(t(as.data.frame(example1.results.model_runs)))
summary(example1.results.dataframe)

Hopefully someone can help me with that....

Thanks
Johannes

-------- Original-Nachricht --------
> Datum: Mon, 29 Aug 2011 10:56:38 -0400
> Von: Pedro Lima Pequeno <pacolipe at gmail.com>
> An: Johannes Radinger <JRadinger at gmx.at>
> CC: R-sig-ecology at r-project.org
> Betreff: Re: [R-sig-eco] "average" regression - bootstrap?

> Hi Johannes,
> 
> I think your approach is reasonable. As you pointed out, however,
> generating such parameter distributions as you did is not strictly the
> same as bootstraping. The bootstrap simulates repeated sampling from
> one original, target population, by assuming the available sample is
> representative of that population and thus resampling it. Hence, it is
> a way of indirectly gathering information from the original
> population. This is only because in practice, resampling the
> population of interest is often impossible.
> In your case, you are directly simulating the target population (from
> a uniform distribution with known limits), an thus bootstraping is not
> needed. However, by directly simulating the target population and its
> samples, your results will mainly reflect properties of this abstract,
> infinite population. If you are also interested in a more "real world"
> setting, you could first simulate a large, but finite population and
> then sample it. At the same time, you could focus on a single, random
> sample from this finite population and then apply the bootstrap, as
> people would usually be able todo with their own data. Then, you could
> compare the results.
> It is also useful to check the shape of the resulting distributions
> before choosing the adequate measures to summarize it. For instance,
> the R2 sampling distribution is likely to be skewed, so using the mean
> will emphasize the tail values; the median could be more
> representative of the central tendency of the distribution in this
> case.
> 
> Regards
> 
> 2011/8/29, Johannes Radinger <JRadinger at gmx.at>:
> > Hello,
> >
> > I've kind of a tricky statistical problem. First of all: I want to do a
> > standard linear regression. Therefore my model is:
> >
> > X <- function()runif(length(Xa), Xa, Xb)
> > model <- lm(Y~X())
> >
> > so X is a function drawing a random number between Xa and Xb (that is
> > necessary in my case). What I did so far is:
> >
> > example1 <- list()
> > n=1000
> > for(i in 1:n) {
> > 	model <- lm(Y~X())
> > 	example1[[paste("run",i,sep="")]] <- model
> > 	}
> >
> > So I ran the regression 1000 times and created a list with the
> regression
> > parameters for each run.
> >
> > How can I analyse these results now? I can get nice mean values for p,
> > R-squared etc. but is that the right way?
> >
> > So I thought, maybe a bootstrap approach can help in this case. Instead
> of
> > doing the "manual" repeaded regression I can use bootstrap. But does the
> > boot-function allow to use the "runif"-function for the X variable, so
> that
> > each bootstrap run a new number is drawn? If it is the case it'd be nice
> > because then I can get summarized results, a thing that I want. On the
> other
> > hand, I don't necessarily need the subsampling of bootstrap. So in my
> case
> > the subsample=all cases. Does that make sense?
> >
> > Hopefully you can give me some inputs
> >
> > best regards
> > Johannes
> >
> >
> > --
> >
> > _______________________________________________
> > R-sig-ecology mailing list
> > R-sig-ecology at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >
> 
> 
> -- 
> Pedro A. C. Lima Pequeno
> Programa de Pós-graduação em Ecologia
> Instituto Nacional de Pesquisas da Amazônia
> Manaus, AM, Brasil

--