[R] Bootstrapping issues

PIKAL Petr petr.pikal at precheza.cz
Mon Nov 12 10:21:41 CET 2012


Hi

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Clive Nicholas
> Sent: Monday, November 12, 2012 8:06 AM
> To: r-help at r-project.org
> Subject: [R] Bootstrapping issues
> 
> sessionInfo()R version 2.15.2 (2012-10-26)
> Platform: i686-pc-linux-gnu (32-bit)
> 
> locale:
>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
> LC_TIME=en_GB.UTF-8
>  [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8
> LC_MESSAGES=en_GB.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
> LC_ADDRESS=C
> [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8
> LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] boot_1.3-7
> 
> loaded via a namespace (and not attached):
> [1] tools_2.15.2
> 
> 
> Hello. I have a very straightforward question. Here's some simulated
> data
> (N=500)
> 
> test<-data.frame(A=rnorm(500,mean=2.72,sd=5.36),
> B=sample(c(12,20,24,28,32),size=500,prob=c(0.333,0.026,0.026,0.436,0.17
> 9),replace=TRUE),C=sample(c(0,1),size=500,replace=TRUE),D=sample(c(0,1)
> ,size=500,replace=TRUE))
> 
> 
> head(test)          A    B    C    D
> 1  1.181804   28    1    0
> 2 -5.602307   12    1    1
> 3  2.925090   24    1    1
> 4  3.437408   28    1    0
> 5 -6.503531   32    0    0
> 6 11.013888   12    1    1
> 
> 
> which I then bootstrap using
> 
> library(boot)
> 
> bs <- function(formula, data, indices) {   test <- data[indices,]
> fit <- lm(formula, data=test)   return(coef(fit))
> 
> }
> 
> 
> The following works
> 
> results <- boot(data=test, statistic=bs, R=1000, A~B+C+D+C*D)
> 

Actually it does not work either

> results <- boot(data=test, statistic=bs, R=1000, A~B+C+D+C*D)
Error in data[indices, ] : incorrect number of dimensions
>

I am not sure but I suspect your bs function expects some indices vector and it is somehow not in accordance with your data.

Regards
Petr 

> 
> results
> 
> 
> But when I then amend the dataset by changing the D variable to
> simulate fixed proportions
> 
> D=sample(c(0,1),size=500,prob=c(0.564,0.436),replace=TRUE
> 
> 
> head(test)            A  B C D
> 1  5.73771963 28 0 1
> 2 -0.19040750 12 1 0
> 3  2.22515982 12 0 1
> 4 -0.02905223 32 1 0
> 5  4.68314112 28 0 1
> 6  5.10711732 12 1 0
> 
> 
> the same bootstrapping routine chokes with an error
> 
> results <- boot(data=test, statistic=bs, R=1000, A~B+C+C*D)Error in
> data[indices, ] : incorrect number of dimensions
> 
> 
> despite the fact that the B variable also has simulated fixed
> proportions and yet the original code ran without any errors. I have
> two general observations to make about this:
> 
> (1) this does not make sense; and
> (2) I don't understand this.
> 
> How best to make these two observations go away and run the code to my
> satisfaction?
> 
> Many thanks.
> 
> --
> Clive Nicholas (clivenicholas.posterous.com)
> 
> [Please DO NOT mail me personally here, but at
> <clivenicholas at hotmail.com>.
> Please respond to contributions I make in a list thread here. Thanks!]
> 
> "My colleagues in the social sciences talk a great deal about
> methodology.
> I prefer to call it style." -- Freeman J. Dyson
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list