[BioC] [Bioc-devel] help with limma design

Mon Jun 22 16:50:32 CEST 2009

Hi Kaiyu,

First off, this isn't an appropriate question for Bioc-devel. That list 
is intended for questions about developing Bioconductor packages, not 
questions about how to use the packages. I have re-directed to the 
correct list.

Kaiyu Shen wrote:
> Hello, folks:
> I am now using limma package to analyze the two-color arrays. Here are
> the six arrays that I have:
> 
> #         Cy3  Cy5
> Array1    MU1   WT
> Array2    WT    MU1
> Array3    MU2   WT
> Array4    WT    MU2
> Array5    MU3   WT
> Array6    WT    MU3
> 
> What I want to analyze is to study the MU1 vs WT.
> I tried two analysis ways, to make it easier, I have not introduced any
> pre-processing methods:
> 
> A. Just have the first two arrays for analysis
> 
> #         Cy3  Cy5
> Array1    MU1 WT
> Array2    WT MU1
> 
> object=readTargets("limma.txt")
> RG=read.maimages(object,source="agilent")
> MA=normalizeWithinarray(RG)
> design=c(1,-1)
> fit=lmFit(MA,design)
> fit=eBayes(fit)
> topTable(fit)
> 
> 
> B. I include all six arrays to have other analysis simultaneously
> 
> #         Cy3  Cy5
> Array1    MU1 WT
> Array2   WT MU1
> Array3   MU2   WT
> Array4    WT MU2
> Array5   MU3   WT
> Array6    WT MU3
> 
> object=readTargets("limma.txt")
> RG=read.maimages(object,source="agilent")
> MA=normalizeWithinarray(RG)
> design=cbind(mu1=c(1,-1,0,0,0,0),mu2=c(0,0,1,-1,0,0),mu3=c(0,0,0,0,1,-1))
> cont.matrix=makeContrasts(mu1,mu2,mu3,levels=design)
> fit=lmFit(MA,design)
> fit2=contrasts.fit(fit,cont.matrix)
> fit2=eBayes(fit2)
> topTable(fit2,coef=1) #to get the first comparison (array1 vs array2)
> 
> 
> However, these two methods do not give me the same results.
> Would somebody give me some suggestions of these two methods?

The differences are primarily due to the fact that you are fitting a 
linear model here, so the denominator of your t-statistic is a measure 
of the variability within each of the groups you have defined. In the 
first case you have only two groups, whereas in the second case you have 
  six groups.

How this affects your results depends on the data. In the second case 
you have increased the amount of data used to compute the sums of 
squares of error (SSE), which will tend to make this value smaller, and 
might result in more genes being significant (smaller denominator => 
larger t-statistic => more genes). However, if the variability within 
the second two groups is much higher than in the first, then this will 
tend to inflate the SSE, and you will get fewer genes.

Best,

Jim

> 
> Thank you very much
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826