[R] analyze summary data

Sun Jun 25 19:13:46 CEST 2006

Thierry Girard <thierry.girard <at> unibas.ch> writes:

> I do have summary data (mean, standard deviation and sample size n)  
> and want to analyze this data.
> The summary data is supposed to be from a normal distribution.
> 
> I need the following calculations on this summary data (no, I do not  
> have the original data):
> 
> - one sample t-test against a known mu
> - two sample t-test
> - analysis of variance between 4 groups.
> 
> I would appreciate any help available.
> 
> One possible solution could be to simulate the data using rnorm with  
> the appropriate n, mu and sd, but I don't know if there would be a  
> more accurate solution.

  this is the kind of situation where you need to go back to the basics --
knowing what computations these statistical tests are _actually
doing_ -- which you should be able to find in any basic stats book, 
or by digging
into the guts of the R functions.  The only other thing you need to
know is the R functions for cumulative distribution functions, pt
(for the t distribution) and pf (for the F dist.)

  For example:

   stats:::t.test.default

 has lots of complicated stuff inside but the key lines are
(for a one sample test)

 nx <- length(x)
  df <- nx - 1
  stderr <- sqrt(vx/nx)
  # if you already have the standard deviation then you want
  # sqrt(sd^2/nx)
 tstat <- (mx - mu)/stderr   ## mu is the known mean you're testing against
 pval <- 2 * pt(-abs(tstat), df)

(assuming 2-tailed)

  you will find similar stuff for the two-sample t-test,
depending on your particular choices.

  The 1-way ANOVA might be harder to dig out of the R code;
there you're better off going back and (re)learning from
a basic stats treatment how to
compute the between-group and (pooled) within-group variances.

  Bottom line is that, except for knowing about pt and pf,
this is really a basic statistics question rather than an
R question.

  good luck
    Ben Bolker

PS: it is too bad, but the increasing sophistication of R is
making it harder for beginners to explore the guts --- e.g.
knowing to look for "stats:::t.test.default" in order to find
the code ...