[R] Bootstrapping in R

Bryan Mac bryanmac.24 at gmail.com
Mon Oct 3 09:24:50 CEST 2016


Hi all,

Here is the first six rows of my data. In total I have 1269 rows.  
My goal is to get conduct nonparametric bootstrap and case resampling. 
I would like to randomly select 100 out of the 1269 After that, I wish to bootstrap that randomly selected 100 out of 1269.

I assume I need to set the seed to conduct this randomization, as with bootstrapping you would get varied results each time the code is run.

##   NAR  SQRTNAR NIC  SQRTNIC
## 1 2.6 1.612452 5.6 2.366432
## 2 8.1 2.846050 9.9 3.146427
## 3 5.7 2.387467 7.1 2.664583
## 4 8.3 2.880972 8.1 2.846050
## 5 7.3 2.701851 9.9 3.146427
## 6 4.9 2.213594 8.6 2.932576
Here is my definition of the DataSummary function.

DataSummary <- function(df, indices){
  sample <- df[indices, ]
  
  sumry_for_NAR <- summary(sample$NAR)
  nms <- names(sumry_for_NAR)
  nms <- c(nms, 'std')
  out_for_NAR <- c(sumry_for_NAR, sd(sample$NAR))
  names(out_for_NAR) <- nms
  
  sumry_for_SQRTNAR <- summary(sample$SQRTNAR)
  nms <- names(sumry_for_SQRTNAR)
  nms <- c(nms, 'std')
  out_for_SQRTNAR <- c(sumry_for_SQRTNAR, sd(sample$SQRTNAR))
  names(out_for_SQRTNAR) <- nms
  
  sumry_for_NIC <- summary(sample$NIC)
  nms <- names(sumry_for_NIC)
  nms <- c(nms, 'std')
  out_for_NIC <- c(sumry_for_NIC, sd(sample$NIC))
  names(out_for_NIC) <- nms
  
  sumry_for_SQRTNIC <- summary(sample$SQRTNIC)
  nms <- names(sumry_for_SQRTNIC)
  nms <- c(nms, 'std')
  out_for_SQRTNIC <- c(sumry_for_SQRTNIC, sd(sample$SQRTNIC))
  names(out_for_SQRTNIC) <- nms
  
  OUT <- c(out_for_NAR, out_for_SQRTNAR, out_for_NIC, out_for_SQRTNIC)
  
  return(OUT)
}
Again, here is my attempt at bootstrapping.

result <- boot(n_data, statistic = DataSummary, R = 100)
result

 Per suggestions, would I go with this code to achieve my goal?  So, the best reference/resource is the boot help page. I found code through various sites and I got really confused because they were very different from each other.

> set.seed(1007)
> 
> x <- rnorm(100)
> y <- x + rnorm(100)
> dat <- data.frame(x, y)

> stat2 <- function(DF, f){
> 	model <- lm(y ~ x, data = DF[f,])
> 	coef(model)
> }
> 
> boot(dat, stat1, R = 100)
> boot(dat, stat2, R = 100)




Bryan Mac
bryanmac.24 at gmail.com



> On Oct 2, 2016, at 5:37 AM, ruipbarradas at sapo.pt wrote:
> 
> Right.
> To see it in action just compare the results of the two calls to boot.
> 
> library(boot)
> 
> set.seed(1007)
> 
> x <- rnorm(100)
> y <- x + rnorm(100)
> dat <- data.frame(x, y)
> 
> #Wrong
> stat1 <- function(DF, f){
> 	model <- lm(DF$y ~ DF$x, data = DF[f,])  #Doesn't bootstrap DF
> 	coef(model)
> }
> 
> #Correct
> stat2 <- function(DF, f){
> 	model <- lm(y ~ x, data = DF[f,])
> 	coef(model)
> }
> 
> boot(dat, stat1, R = 100)
> boot(dat, stat2, R = 100)
> 
> 
> Rui Barradas
> 
> 
> Citando peter dalgaard <pdalgd at gmail.com>:
> 
>>> On 01 Oct 2016, at 16:11 , Daniel Nordlund <djnordlund at gmail.com> wrote:
>>> 
>>> You haven't told us anything about the structure of your data, or the definition of the DataSummary function.
>> 
>> Yes. Just let me add that a common error with boot() is not to pay attention to the required form of the statistic= function argument. It should depend on the data and a set of indices and (for nonparametic bootstrap) it is the indices that are random.
>> 
>> Typical mistakes are to completely ignore the index argument, or to write clumsy code that ignores the data specification, as in
>> coef(lm(df$y~df$x, data=d[f])).
>> 
>> 
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 


	[[alternative HTML version deleted]]



More information about the R-help mailing list