[R] boot with strata: strata argument ignored?

Charles C. Berry cberry at tajo.ucsd.edu
Sat Jun 26 23:09:09 CEST 2010


On Sat, 26 Jun 2010, Bryan Hanson wrote:

> Thanks Chuck, I understand much better what is going on with your example.
> But I'm still uncertain why the b2$t array does not have the dimensions of R
> x no. of strata.

Because the test statistic returned by mm() is a scalar. It has nothing to 
do with the use or number of strata.

Look at what the first case in example( boot ) is doing:

> ncol(boot(grav1, diff.means, R=999, stype="f")$t)
[1] 2
> ncol(boot(grav1, diff.means, R=999, stype="f",strata=grav1[,1])$t)
[1] 2
> diff.means(grav1,1:nrow(grav1))
[1] -4.100549 14.722902
>

Chuck

>
> Any further insight would be appreciated.  Bryan
> *************
> Bryan Hanson
> Acting Chair
> Professor of Chemistry & Biochemistry
> DePauw University, Greencastle IN USA
>
>
>
> On 6/26/10 12:43 PM, "Charles C. Berry" <cberry at tajo.ucsd.edu> wrote:
>
>> On Sat, 26 Jun 2010, Bryan Hanson wrote:
>>
>>> Hello All.  I must be missing the really obvious here:
>>>
>>> mm <- function(d, i) median(d[i])
>>> b1 <- boot(gravity$g, mm, R = 1000)
>>> b1
>>> b2 <- boot(gravity$g, mm, R = 1000, strata = gravity$series)
>>> b2
>>>
>>> Both b1 and b2 seem to have done (almost) the same thing, but it looks like
>>> the strata argument in b2 has been ignored.  However, str(b1) vs str(b2)
>>> does show that the strata have been noted correctly.  But b2$t is a 1000 x 1
>>> array, not a 1000 x 8 array (gravity$series is a factor with 8 levels).
>>>
>>> There is a more complex example in ?boot using the same data set that gives
>>> a result that seems to make sense (2 levels in the factor, so $t has 2
>>> columns).
>>>
>>> I either misunderstand the expected behavior or I've missed some punctuation
>>> or syntax detail.
>>
>> Your punctuation and syntax is OK.
>>
>> Note:
>>
>>> SISWR <- function(x) sample(x,length(x),repl=TRUE)
>>> # no strata
>>> var(replicate(1000,median(SISWR(gravity$g))))
>> [1] 0.4588338
>>> # now stratify on series
>>> gsplit <- split(gravity$g,gravity$series)
>>> var(replicate(1000,median(unlist(lapply(gsplit,SISWR)))))
>> [1] 0.3882272
>>>
>>> sqrt(.45) # this agrees  with b1
>> [1] 0.6708204
>>> sqrt(.39) # this agrees with b2
>> [1] 0.6244998
>>>
>>
>> The effect of stratification depends on the relative amount of variation
>> within vs between strata. This suggests there is not a lot:
>>
>>> aov(g~series,gravity)
>> Call:
>>     aov(formula = g ~ series, data = gravity)
>>
>> Terms:
>>                    series Residuals
>> Sum of Squares  2818.624  8239.376
>> Deg. of Freedom        7        73
>>
>> Residual standard error: 10.62394
>> Estimated effects may be unbalanced
>>>
>>
>>
>> HTH,
>>
>> Chuck
>>
>>>
>>> TIA, Bryan
>>>
>>> *************
>>> Bryan Hanson
>>> Acting Chair
>>> Professor of Chemistry & Biochemistry
>>> DePauw University, Greencastle IN USA
>>>
>>>> sessionInfo()
>>> R version 2.11.0 (2010-04-22)
>>> x86_64-apple-darwin9.8.0
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] datasets  tools     grid      graphics  grDevices utils     stats
>>> [8] methods   base
>>>
>>> other attached packages:
>>> [1] boot_1.2-42        brew_1.0-3         faraway_1.0.4
>>> [4] GGally_0.2         xtable_1.5-6       mvbutils_2.5.1
>>> [7] ggplot2_0.8.7      digest_0.4.2       reshape_0.8.3
>>> [10] proto_0.3-8        ChemoSpec_1.43     R.utils_1.4.0
>>> [13] R.oo_1.7.2         R.methodsS3_1.2.0  rgl_0.91
>>> [16] lattice_0.18-5     mvoutlier_1.4      plyr_0.1.9
>>> [19] RColorBrewer_1.0-2 chemometrics_0.8   som_0.3-5
>>> [22] robustbase_0.5-0-1 rpart_3.1-46       pls_2.1-0
>>> [25] pcaPP_1.8-1        mvtnorm_0.9-9      nnet_7.3-1
>>> [28] mclust_3.4.4       MASS_7.3-5         lars_0.9-7
>>> [31] e1071_1.5-23       class_7.3-2
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> Charles C. Berry                            (858) 534-2098
>>                                              Dept of Family/Preventive
>> Medicine
>> E mailto:cberry at tajo.ucsd.edu             UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list