[Rd] [boot] bootstrap issue when at least one strata has only one observation (PR#13584)

dsoudant at ifremer.fr dsoudant at ifremer.fr
Mon Mar 9 16:35:06 CET 2009


Full_Name: Dominique Soudant
Version: 2.4.1
OS: Winbdows
Submission from: (NULL) (134.246.54.61)


R 2.4.1
boot 1.2-27 

Let us consider the following example with 8 strata, one observation for each :

> library(boot)
> df <- data.frame(Values=runif(8),month=1:8)
> df
      Values month
1 0.02721540     1
2 0.13618392     2
3 0.99979095     3
4 0.80441083     4
5 0.42374115     5
6 0.04928531     6
7 0.34387447     7
8 0.13277458     8
> P90 <- function(DataIn,i){
+   DataIn <- DataIn[i,]
+   P90 <- round(quantile(as.numeric(DataIn$Values)
+                         ,probs=0.9
+                         ,names=FALSE
+                         ,type=4)
+               ,digits=1)
+   return(P90)
+ }
> 
> bootP90 <- boot(df,P90,10,strata=df$month)
> bootP90$t
      [,1]
 [1,]  0.2
 [2,]  0.5
 [3,]  0.8
 [4,]  0.2
 [5,]  0.2
 [6,]  1.0
 [7,]  0.5
 [8,]  1.0
 [9,]  0.8
[10,]  1.0

Results are differents, they should be equal. I guess that the issue is coming
from the function ordinary.array() from the boot package : 

ordinary.array <- function (n, R, strata)
{
    output <- matrix(0, R, n)
    inds <- as.integer(names(table(strata)))
    for (is in inds) {
        gp <- c(1:n)[strata == is]
        output[, gp] <- matrix(sample(gp, R * length(gp), replace = TRUE),
            nrow = R)
    }
    output
}

> ordinary.array(8,10,1:8)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
 [1,]    1    1    2    2    1    1    5    2
 [2,]    1    1    2    2    3    3    2    5
 [3,]    1    2    1    4    2    1    1    7
 [4,]    1    1    3    2    5    5    3    6
 [5,]    1    1    2    4    5    3    7    1
 [6,]    1    2    2    4    4    6    1    8
 [7,]    1    1    2    2    4    2    7    5
 [8,]    1    2    3    2    3    4    1    7
 [9,]    1    2    2    1    1    3    1    2
[10,]    1    1    2    4    5    1    1    5

In ordinary.array(), the issue is coming from sample() : 
[quote]If x has length 1 and x >= 1, sampling takes place from 1:x[/quote]
Note that when only one strata has only one observation, the problem is
identical 
> ordinary.array(7,10,c(rep(1:3,each=2),4))
      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]    1    2    3    3    6    5    6
 [2,]    2    2    3    4    5    5    7
 [3,]    2    2    4    4    6    5    5
 [4,]    1    1    4    4    6    5    1
 [5,]    2    2    3    3    6    5    7
 [6,]    1    2    3    4    5    5    1
 [7,]    1    2    4    3    6    6    5
 [8,]    2    1    3    4    5    6    2
 [9,]    2    2    3    4    6    6    4
[10,]    2    2    4    3    6    5    2 
but less detectable in the results.

The same results are obtained with : 
R 2.8.1
boot 1.2-35

Best regards.



More information about the R-devel mailing list