[R] Bootstrap CIs for weighted means of paired differences

Sat Nov 22 17:21:18 CET 2014

Ok, thanks for the suggestions. I will look into that. And you are absolutely right that I should have been more clear about what type of weighting I want. So to clarify: I run time series regressions of returns of company i on two different sets of explanatory variables. Then I extract the respective intercepts of the two regressions and take the difference between both. I repeat this for the whole sample of companies and then compute the market value weighted average of those differences. 

On 21 Nov 2014, at 19:18, David Winsemius <dwinsemius at comcast.net> wrote:

> 
> On Nov 21, 2014, at 6:52 AM, ivan wrote:
> 
>> I am aware of the fact that bootstrapping produces different CIs with every run. I still believe that there is a difference between both types of procedures. My understanding is that setting "w" in the boot() function influences the "importance" of observations or how the bootstrap selects the observations. I.e, observation i does not have the same probability of being chosen as observation j when "w" is defined in the boot() function. If you return res_boot you will notice that with "w" being set in the boot() function, the function call states "weighted bootstrap". If not, it states "ordinary nonparametric bootstrap". But maybe I am wrong. 
> 
> OK. So in the the second call w affects the probability of a case being sent to the boot-function as well as being used in the boot-function; while with the "non-weighted call" the w's are only affecting the individual mean estimates. So the second one is different. And as I suggested earlier you never described the goals of the investigation or the meaning of the variables. 
> 
>   I can tell you that when Davison and Hinkley offered examples of using a bootstrap for a weighted bootstrap mean, they compared a stratified analysis with an example where the weighting was only used on the inner function (example 3.2, practical 3.14 pp 72, 131 of their book) with one where the strata parameter was used. But so far I don't think you have ever described what sort of weights these actually are. In that example the weights were the inverse variances of the sample groups. They didn't use a 'weights' parameter in the boot call. I'm do not know if it was part of the S package that was being used at the time.
> 
> I tried to find an example of a weighted bootstrap in V&R 4e but did not see one. Prof Ripley is the maintainer of the boot package. In the V&R book, Angelo Canty is given the credit for writing the boot package for S. I think you should consult the code, first. And you should also look at the `stype` parameter where "w" is one option.
> 
> -- 
> David.
> 
>> 
>> On Thu, Nov 20, 2014 at 8:19 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>> 
>> On Nov 20, 2014, at 2:23 AM, i.petzev wrote:
>> 
>>> Hi David,
>>> 
>>> sorry, I was not clear.
>> 
>> Right. You never were clear about what you wanted and your examples was so statistically symmetric that it is still hard to see what is needed. The examples below show CI's that are arguably equivalent. I can be faulted for attempting to provide code that produced a sensible answer to a vague question to which I was only guessing at the intent.
>> 
>> 
>>> The difference comes from defining or not defining “w” in the boot() function. The results with your function and your approach are thus:
>>> 
>>> set.seed(1111)
>>> x <- rnorm(50)
>>> y <- rnorm(50)
>>> weights <- runif(50)
>>> weights <- weights / sum(weights)
>>> dataset <- cbind(x,y,weights)
>>> 
>>> vw_m_diff <- function(dataset,w) {
>>>  differences <- dataset[w,1]-dataset[w,2]
>>>  weights <- dataset[w, "weights"]
>>>  return(weighted.mean(x=differences, w=weights))
>>> }
>>> res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000, w=dataset[,3])
>>> boot.ci(res_boot)
>>> 
>>> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
>>> Based on 1000 bootstrap replicates
>>> 
>>> CALL :
>>> boot.ci(boot.out = res_boot)
>>> 
>>> Intervals :
>>> Level      Normal              Basic
>>> 95%   (-0.5657,  0.4962 )   (-0.5713,  0.5062 )
>>> 
>>> Level     Percentile            BCa
>>> 95%   (-0.6527,  0.4249 )   (-0.5579,  0.5023 )
>>> Calculations and Intervals on Original Scale
>>> 
>>> ********************************************************************************************************************
>>> 
>>> However, without defining “w” in the bootstrap function, i.e., running an ordinary and not a weighted bootstrap, the results are:
>>> 
>>> res_boot <- boot(dataset, statistic=vw_m_diff, R = 1000)
>>> boot.ci(res_boot)
>>> 
>>> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
>>> Based on 1000 bootstrap replicates
>>> 
>>> CALL :
>>> boot.ci(boot.out = res_boot)
>>> 
>>> Intervals :
>>> Level      Normal              Basic
>>> 95%   (-0.6265,  0.4966 )   (-0.6125,  0.5249 )
>> 
>> I hope you are not saying that because those CI's are different that there is some meaning in that difference. Bootstrap runs will always be "different" than each other unless you use set.seed(.) before the runs.
>> 
>>> 
>>> Level     Percentile            BCa
>>> 95%   (-0.6714,  0.4661 )   (-0.6747,  0.4559 )
>>> Calculations and Intervals on Original Scale
>>> 
>>> On 19 Nov 2014, at 17:49, David Winsemius <dwinsemius at comcast.net> wrote:
>>> 
>>>>>> vw_m_diff <- function(dataset,w) {
>>>>>>    differences <- dataset[w,1]-dataset[w,2]
>>>>>>   weights <- dataset[w, "weights"]
>>>>>>   return(weighted.mean(x=differences, w=weights))
>>>>>> }
>>> 
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 
>> 
> 
> David Winsemius
> Alameda, CA, USA

	[[alternative HTML version deleted]]