[RsR] R for Dummies (update)

Wed Jul 11 14:27:50 CEST 2007

A few days ago I posted a query for help in understanding bootstrapping, calling it “Bootstrapping for Dummies”. I was awed by the amount of responses that I got, with lots of useful information. I thought it might be nice at this point to put together some of the info for others, and also to report on the state of my quest for understanding.

First, to point out the basic level that I am operating at, I want to show you what I do with functions that I discuss in my book:
t.test(torres$GRAMMAR, alternative='two.sided', mu=3, conf.level=.95)
t.test (x, . . .) # gives the command for all t-tests, not just the one-sample test
torres$GRAMMAR # this is the GRAMMAR variable in the torres data set
alternative=”two.sided” # this default calls for a two-sided hypothesis test; other alternatives: “less”, “greater”
mu=3 # this tells R that you want a one-sample test; it compares your measured mean score to an externally measured mean
conf.level=.95 # sets the confidence level for the mean difference
(by the way, I put it in a table and it looks nicer than this . . . )

See? Really for dummies. So what I’m working on now is doing one-sample and two-sample t-tests. Wilcox (2003) says that the best thing to do is percentile bootstrap with 20% trimmed means. But I haven’t found anything built in to R that does this. And I know that if I understood bootstrapping I could easily do it in some built-in library, but I don’t understand the steps of bootstrap enough to do it. Wilcox’s trimpb and trimpb2 functions are what I’m currently using, and they return a 95% CI, and a p-value. I’m not sure a p-value is necessary when I have the CI, but that’s fine. I would also like to say something about the effect size of such a t-test. But I would need a standardizer for that, and I’m not sure what calculation I should use to get a good effect size for a bootstrapped trimmed mean calculation. 

The information that seemed most pertinent to me was:

http://www.mayin.org/ajayshah/KB/R/documents/boot.html  (sent to me by Ajay Shah)
Explains step by step how R library boot does bootstrapping
This was extremely useful in helping me start to understand how indexing can work in a bootstrap. However, I am still confused as to how one understands how to write the functions (for returning means, etc), and then what to do with the boot object after it is returned. 

In conjunction with Ajay’s tutorial, I also read:
Article by Angelo Canty in Dec 2002 R News, re the Boot library
This gave me some more info about what boot can do. I found out that the “t” returned in the output is the matrix of bootstrap replicates
The t0 is the result of evaluating the statistic on the original dataset
Two main methods are print and plot for boot objects
Most common thing you want to call out is CI, use boot.ci for this
I don’t understand (and didn’t read) about censored bootstraps and time series

Patrick Burns’ bootstrap tutorial on  www.burns-stat.com (thanks to several for this)
This gives examples of how to write your own functions for bootstrapping. Patrick says that he doesn’t use any libraries, but just writes his own functions, using the (i for) loop. However, there is not enough explanation here for me to figure out how I could write my own bootstrap function for returning the 20% trimmed means of a two-sample t-test.

However, someone on the Burns website I found an article which listed R libraries involving bootstrapping, and saw that the simpleboot library has some functions that could work to do one-sample and two-sample t-tests. The problem is, I can’t figure out how to get them to return the 20% trimmed means. I like the boot.ci because it gives me various kinds of bootstraps, and I can get a histogram that I can modify if I want too. A lot of other data comes with the simpleboot commands, and I don’t think I understand all of it, but perhaps I don’t need to.

Bob Pruzek also sent me a link (but it needed a password, so I’m not sure I should reveal it without his permission) tourse pages at http://hsta559s07.pbwiki.com. Under his Class Notes there was a file about Bootstrapping, and the intro was very clear to me. I was trying to follow along and replicate in R but when it used the bootstrap library, my R wouldn’t follow along, 
> mns4.y=bootstrap(y,nboots=1000,means4)
Error in n * nboot : non-numeric argument to binary operator
and that’s where I got lost. It looked really pertinent to what I was doing, but unfortunately I got lost.

SO . . . let me do this. I will lay out a command in the boot library and explain what I think is going on (as you will see, I really still don’t understand). I will use the first example in the boot(boot) help documentation.

# usual bootstrap of the ratio of means using the city data
ratio <- function(d, w)
     sum(d$x * w)/sum(d$u * w)
boot(city, ratio, R=999, stype="w")

ratio=function (d, w)
       sum(d$x*w/sum(d$u*2)
ratio=function #first, the function is named; BTW, why use <- when you can use = just as well? I prefer the simpler “=”; can anyone tell me why I shouldn’t use that?)
function (d, w) #writing ‘function’ sets up what follows as a function; w is the index (maybe), but I’m not sure what d is
sum(d$x*2)#this causes the function to sum up all of the cases of d$x and multiply them by w—but what is d$x?
sum()/sum() #gives the ratio of the sums—where is the means? what is in w?
boot () #this is the call to bootstrap
city #this is the data
ratio #this is the function we’ve created
R=999 #this is the number of bootstrap replicates
stype=”w” #this is the statistic type but what is w?

I understand that if I wanted to do a bootstrap of a two-sample t-test with 20% means trimming, I would first need to write a function that would set up a two-sample t-test, and set up the means trimming inside it (let’s call it “t20trim=function”). Then, bootstrapping it would be a simple matter:
boot(data, t20trim, R=999, stype=”w”) #I’m guessing on the last part because I still don’t understand what stype is
I’d be able to pull the CI out, and plot it, but I still wouldn’t know what to do with effect sizes.

I would also like to see some real examples of real boots that people have used and be talked through these. For example, Ajay’s tutorial was really interesting but why would I want to index the 3, 2, 2 position? I wonder how this works when you really want to run a real bootstrap. What *do* I want to index? My confusion is legion, but maybe spelling it out like this will make it easier for you to understand what I’m missing.

Thanks in advance for bearing with me.
Jenifer

Dr. Jenifer Larson-Hall
Assistant Professor of Linguistics
University of North Texas
(940)369-8950