[R] bootstrap query
Silvia Kirkman
silviakirkman at yahoo.com
Fri Nov 5 10:29:00 CET 2004
Hi
I need to bootstrap a function in R and I am
struggling. Can anyone help? The following explains
what IÂm trying to do:
I have 2 different matrices, called "x" and "y". Each
has 34 columns, and the length of each column varies.
I use this data to determine a certain measure (C),
which IÂve calculated in R as follow:
> schoener<-function(x,y,z)
+ {
+
+ # x - seals
+ # y - fishery
+ # z - column of matrix
+
+ breaks<-c(0:66)/2
+ hseal<-hist(na.omit(x[,z]), breaks = breaks, freq =
FALSE, include.lowest = FALSE, right = FALSE, plot =
FALSE)
+ hfish<-hist(na.omit(y[,z]), breaks = breaks, freq =
FALSE, include.lowest = FALSE, right = FALSE, plot =
FALSE)
+ lseal<-length(na.omit(x[,z]))
+ lfish<-length(na.omit(y[,z]))
+ pseal<-(hseal$counts)/lseal
+ pfish<-(hfish$counts)/lfish
+ C<-(1-sum(abs(pseal-pfish))/2)*100
+ C
+ }
IÂve also managed to resample (with replacement) the
data in each column of x and y as follows, to give me
new C values:
>resample<-function(x,y,z)
{
# x - seals
# y - fishery
# z - column of matrix
lseal<-length(na.omit(x[,z]))
lfish<-length(na.omit(y[,z]))
resampleseal<-sample(na.omit(x[,z]), lseal, replace =
TRUE)
resamplefish<-sample(na.omit(y[,z]), lfish, replace =
TRUE)
breaks<-c(0:66)/2
hseal<-hist(resampleseal, breaks = breaks, freq =
FALSE, include.lowest = FALSE, right = FALSE, plot =
FALSE)
hfish<-hist(resamplefish, breaks = breaks, freq =
FALSE, include.lowest = FALSE, right = FALSE, plot =
FALSE)
pseal<-(hseal$counts)/lseal
pfish<-(hfish$counts)/lfish
(1-sum(abs(pseal-pfish))/2)*100
}
What I want to be able to do is to obtain 10 000 C
values so that I can get the 95% confidence limits. In
other words, resample 10 000 times. I have tried to
use the "boot" function in R, but I just canÂt get it
right:
boot(data, statistic, R, sim="ordinary", stype="i",
strata=rep(1,n), L=NULL, m=0, weights=NULL,
ran.gen=function(d, p) d, mle=NULL, ...)
According to above, "statistic=resample" (as IÂve
defined above), "R=10000", and "data" would be x and
y. IÂm obviously not understanding something,
especially how to refer to x and y for "data". IÂm
sure it must be quite simple what I want to do - I
wonder if anyone out there can explain it to me.
Many thanks.
Silvia
More information about the R-help
mailing list