[R] ANOVA question

Robert Latest boblatest at gmail.com
Fri May 11 09:37:24 CEST 2012


Hello all,

I'm very satisfied to say that my grip on both R and statistics is
showing the first hints of firmness, on a very greenhorn level.

I'm faced with a problem that I intend to analyze using ANOVA, and to
test my understanding of a primitive, one-way ANOVA I've written the
self-contained practice script below. It works as expected.

But here's my question: How can I not only get the values of the
coefficients for the different levels of the explanatory factor(s),
but also the corresponding standard errors and confidence levels?
Below I have started doing that "on foot" by looping over the levels
of my single factor, but I suppose this gets complicated and messy
with more complex models. Any ideas?

Thanks,
robert


set.seed(0)

N <- 100 # sample size

MEAN <- c(10, 20, 30, 40, 50)
VAR <- c(20,20,1, 20, 20)
LABELS <- c("A", "B", "C", "D", "E")

# create a data frame with labels
df <- data.frame(Label=rep(LABELS, each=N))
df$Value <- NA
# fill in random data for each factor level
for (i in 1:length(MEAN)) {
    df$Value[(1+N*(i-1)):(N*i)] <- rnorm(N, MEAN[i], sqrt(VAR[i]))
}



par(mfrow=c(2,2))
plot(df)          # Box plot of the data
plot(df$Value)    # scatter plot

mod_aov <- aov(Value ~ Label, data=df)

print(summary(mod_aov))
print(mod_aov$coefficients)

rsd <- mod_aov$residuals

plot(rsd)

# find and print mean() and var() for each level
for (l in levels(df$Label)) {
    index <- df$Label == l

    # Method 1: directly from data
    smp <- df$Value[index]  # extract sample for this label
    ssq_smp <-  var(smp)*(length(smp)-1) # sum of squares is variance
                                         # times d.f.
    # Method 2: from ANOVA residuals
    rsd_grp <- rsd[index]                # extract residuals
    ssq_rsd <- sum(rsd_grp **2)          # compute sum of squares

    # print mean, variance, and difference between SSQs from the two
    # methods.
    write(sprintf("%s: mean=%5.1f var=%5.1f (%.2g)", l,
        mean(smp), var(smp),
        ssq_smp-ssq_rsd), "")
# ...and it works like expected! But is there a shortcut that would give me
# the same result in a one-liner?
}



More information about the R-help mailing list