[R] Bootstrap CIs for weighted means of paired differences

ivan i.petzev at gmail.com
Fri Nov 21 15:52:27 CET 2014


I am aware of the fact that bootstrapping produces different CIs with every
run. I still believe that there is a difference between both types of
procedures. My understanding is that setting "w" in the boot() function
influences the "importance" of observations or how the bootstrap selects
the observations. I.e, observation i does not have the same probability of
being chosen as observation j when "w" is defined in the boot() function.
If you return res_boot you will notice that with "w" being set in the
boot() function, the function call states "weighted bootstrap". If not, it
states "ordinary nonparametric bootstrap". But maybe I am wrong.

On Thu, Nov 20, 2014 at 8:19 PM, David Winsemius <dwinsemius at comcast.net>
wrote:

>
> On Nov 20, 2014, at 2:23 AM, i.petzev wrote:
>
> > Hi David,
> >
> > sorry, I was not clear.
>
> Right. You never were clear about what you wanted and your examples was so
> statistically symmetric that it is still hard to see what is needed. The
> examples below show CI's that are arguably equivalent. I can be faulted for
> attempting to provide code that produced a sensible answer to a vague
> question to which I was only guessing at the intent.
>
>
> > The difference comes from defining or not defining “w” in the boot()
> function. The results with your function and your approach are thus:
> >
> > set.seed(1111)
> > x <- rnorm(50)
> > y <- rnorm(50)
> > weights <- runif(50)
> > weights <- weights / sum(weights)
> > dataset <- cbind(x,y,weights)
> >
> > vw_m_diff <- function(dataset,w) {
> >   differences <- dataset[w,1]-dataset[w,2]
> >   weights <- dataset[w, "weights"]
> >   return(weighted.mean(x=differences, w=weights))
> > }
> > res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3])
> > boot.ci(res_boot)
> >
> > BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
> > Based on 1000 bootstrap replicates
> >
> > CALL :
> > boot.ci(boot.out = res_boot)
> >
> > Intervals :
> > Level      Normal              Basic
> > 95%   (-0.5657,  0.4962 )   (-0.5713,  0.5062 )
> >
> > Level     Percentile            BCa
> > 95%   (-0.6527,  0.4249 )   (-0.5579,  0.5023 )
> > Calculations and Intervals on Original Scale
> >
> >
> ********************************************************************************************************************
> >
> > However, without defining “w” in the bootstrap function, i.e., running
> an ordinary and not a weighted bootstrap, the results are:
> >
> > res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000)
> > boot.ci(res_boot)
> >
> > BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
> > Based on 1000 bootstrap replicates
> >
> > CALL :
> > boot.ci(boot.out = res_boot)
> >
> > Intervals :
> > Level      Normal              Basic
> > 95%   (-0.6265,  0.4966 )   (-0.6125,  0.5249 )
>
> I hope you are not saying that because those CI's are different that there
> is some meaning in that difference. Bootstrap runs will always be
> "different" than each other unless you use set.seed(.) before the runs.
>
> >
> > Level     Percentile            BCa
> > 95%   (-0.6714,  0.4661 )   (-0.6747,  0.4559 )
> > Calculations and Intervals on Original Scale
> >
> > On 19 Nov 2014, at 17:49, David Winsemius <dwinsemius at comcast.net>
> wrote:
> >
> >>>> vw_m_diff <- function(dataset,w) {
> >>>>     differences <- dataset[w,1]-dataset[w,2]
> >>>>    weights <- dataset[w, "weights"]
> >>>>    return(weighted.mean(x=differences, w=weights))
> >>>>  }
> >
>
> David Winsemius
> Alameda, CA, USA
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list