[R] Another question about bootstrapping cor
Conklin, Mike (GfK)
Mike.Conklin at gfk.com
Thu Mar 6 20:39:10 CET 2014
Read the help on boot.
Specifically in non-parametric bootstrapping the statistic function takes a data (the original data) and an index number that shows which rows are taken in the bootstrap sample. So you need to do the following.
x <- 1:15
y <- c(2,4,1,3,5, 7,6, 9,10,8, 14, 13, 11, 15, 12)
x[3] <- NA; x[11] <- NA; x[8] <- NA; y[2] <- NA; y[8] <- NA; y[12] <- NA
cor(x,y,use="complete.obs",method="kendall")
library(boot)
tmpdf <- data.frame(x,y)
cor1 <- function(data,idx) {
df<-data[idx,]
cor(df$x,df$y,use="complete.obs",method="kendall")}
corboot <- boot(tmpdf,cor1,R=999)
Bootstrap Statistics :
original bias std. error
t1* 0.8666667 -0.001852283 0.1099832
in your previous function the index was also getting passed to the cor1 function .
To see the individual results use:
corboot$t
t is a matrix of the bootstrap replicates.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Parkhurst
Sent: Thursday, March 06, 2014 12:28 PM
To: r-help at r-project.org
Subject: [R] Another question about bootstrapping cor
When I use the fix that Arun K. provided to my earlier example, I wonder how to find out where in the 999 bootstrap repetitions the value for the actual data fits.
Here is the fixed code:
x <- 1:15
y <- c(2,4,1,3,5, 7,6, 9,10,8, 14, 13, 11, 15, 12) x[3] <- NA; x[11] <- NA; x[8] <- NA y[2] <- NA; y[8] <- NA; y[12] <- NA
cor(x,y,use="complete.obs",method="kendall")
library(boot)
tmpdf <- data.frame(x,y)
## corboot <- boot(tmpdf,
cor(x,y,use="complete.obs",method="kendall"),R=999)
cor1 <- function(x,y) {cor(x,y,use="complete.obs",method="kendall")}
corboot <- boot(tmpdf,cor1,R=999)
When I run that, I get
> corboot
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = tmpdf, statistic = cor1, R = 999)
Bootstrap Statistics :
original bias std. error
t1* 1.0000000 -1.0062026 0.2490860
t2* 0.8666667 -0.8694414 0.2491383
t2* is the estimate for the actual data. What is t1*? And again, how do I find where in the distribution of the 999 reps does that true value fall?
I would expect print(corboot) to give me more information, but it just repeats that same output.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list