[BioC] limma eBayes: how to determine goodness of fit?

Fri Jun 1 01:08:00 CEST 2007

Hi Paul.

Some are in "MArrayLM" object, others require a couple commands.

For example, using the data example in lmFit help:
------------
sd <- 0.3*sqrt(4/rchisq(100,df=4))
y <- matrix(rnorm(100*6,sd=sd),100,6)
rownames(y) <- paste("Gene",1:100)
y[1:2,4:6] <- y[1:2,4:6] + 2
design <- cbind(Grp1=1,Grp2vs1=c(0,0,0,1,1,1))

# Ordinary fit
fit <- lmFit(y,design)
fit <- eBayes(fit)
------------

you get recreate what you want in the linear model by the following

1) Residual standard error

 > lm.first<-lm(y[1,]~-1+design)
 > sqrt(anova(lm.first)["Residuals","Mean Sq"])
[1] 0.2990779
 > fit$sigma[1]
    Gene 1
0.2990779

2) R-Squared

 > sst<-rowSums(y^2)
 > ssr<-sst-fit$df.residual*fit$sigma^2
 > (ssr/sst)[1]
   Gene 1
0.964451
 > summary(lm.first)$r.squared
[1] 0.964451

...

4)  The F-statistic will change since the variances are moderated,  
causing both the statistic to change and the degrees of freedom to  
change.

Cheers,
Mark

On 01/06/2007, at 5:38 AM, Paul Shannon wrote:

> A summary of an lm result includes some readily-understood
> goodness of fit information:
>
>  1) Residual standard error
>  2) Multiple R-Squared
>  3) Adjusted R-Squared
>  4) F-statistic
>
> With limma, and eBayes, I deduce (incorrectly?) that efit$F and efit 
> $F.p.value
> convey information similar to number 4 above.  How about the first  
> three
> measures -- is there any way to get equivalent information for the  
> linear
> model eFit produces?
>
> Thanks!
>
>  - Paul