[BioC] Question on limma single-channel analysis of dual-channel microarray

Sun Sep 16 03:21:38 CEST 2007

 Thanks, Naomi.

When I looked at the simple limma analysis of the 3 biological replicates (disease vs normal) the results closely correlated to some immunohistochemistry results we had -- so I tend to believe (even tho a small sample size) the simple limma analysis results on the three biological replicates. 

The differences in the IHC-measured genes in the single-channel approach did not match the IHC results at all, so I have a more difficult time believing the single-channel approach is approximating the biology.

However, I do want to look across disease and across dose, so the single channel method is my only option it seems.

I appreciate your suggestion (changing the intercept to "1"?) to look for deviations from the mean (as in model.matrix(~1 + f)  with the single channel approach.

Deanne

---
Deanne Taylor PhD
Department of Biostatistics
Harvard School of Public Health
655 Huntington Avenue
Boston, MA 02115
dtaylor at hsph.harvard.edu

>>> Naomi Altman <naomi at stat.psu.edu> 09/15/07 8:51 PM >>> 
Hi Deanne,
Gordon can correct me here if I am wrong, but I think that when there 
is no intercept in the design matrix then intraspotCorrelation uses 
deviations from 0, rather than deviations from the mean, which is the 
wrong type of correlation to account for the array effect.

Besides this, I have noticed (and I think others have noted on this 
list) that small changes to microarray data, such as the 
normalization method, exclusion of a few arrays, etc, can lead to 
large changes in the list of significant genes.  On the other hand, 
the genes that jump on and off the list are often low expressing 
genes. For these genes, the F or t-test denominator is very sensitive 
to small changes in the data.  Regularization methods such as limma 
and SAM damp this somewhat, but in small sample sizes you can still 
get sizeable effects.  If you use all of the arrays, you are doing a 
much different denominator regularization than if you use only a few arrays.

--Naomi

At 08:21 PM 9/15/2007, Deanne Taylor wrote:
>Hi...
>
>I have a few complex analyses I am trying to perform on some older 
>data. I've tried looking through the mailing lists and literature 
>and perhaps what I have here is a difficulty in understanding. So, 
>on to my question...
>
>I have 48 microarrays of two-channel Agilent data, no technical 
>replicates or dye swaps, but 3 biological replicates of 16 groups.
>
>For instance, one one chip would be Disease_24h_1mg vs 
>Normal_24h_1mg. And so forth...
>
>There need to be comparisons made between the groups of doses and 
>time series, which I'm comfortable with (at least conceptually and 
>with single-channel data). There is no common reference, and most of 
>this data are time series comparing two conditions and doses of 
>treatment on diseased and disease-free:
>
>
>Condition     Time     Treatment
>Disease        24h       1mg
>Normal         24h       1mg
>Disease        48h       1mg
>Normal         48h       1mg
>                (...)
>
>Disease        48h       2mg
>Normal         48h       2mg
>Disease        24h       2mg
>Normal         24h       2mg
>               (...)
>
>(and so forth)
>
>I would like to make contrasts between diseased at different time 
>series (24, 48, 120hours, etc) as well as across doses (1mg, 2mg, 
>etc) and compare all of it to some undosed normal controls. I think 
>I'm almost there, but I find I'm getting unexpected results.
>
>
>I have taken the following steps which all seem to work. In this 
>script, I have left in the control spots in MA because I found some 
>odd behavior when I tried to take them out with MA$genes$ControlType ==0
>
>library(limma)
>ID="Targets"
>targets<-readTargets(paste(ID, ".csv",sep=""), sep=",")
>files<-targets$FileName
>RG<-read.maimages(files, source="agilent")
>RG_norm<-normalizeWithinArrays(RG, bc.method="none", method="loess") 
>#is this step necessary here?
>MA<-normalizeBetweenArrays(RG_norm, method="Aquantile")
>targets2 <- targetsA2C(targets)
>targets2
>u <- unique(targets2$Target)
>f <- factor(targets2$Target, levels=u)
>design <- model.matrix(~0+f)
>colnames(design) <- u
>corfit <- intraspotCorrelation(MA, design)
>fit <- lmscFit(MA, design, correlation=corfit$consensus)
>#Now, I wanted to check out the first "test" of what I can get with 
>a normal contrast between Cy3 and Cy5 on the chips themselves:
>cont.dif<-makeContrasts(Disease_24_1mg - Normal_24_1mg, levels=design)
>fit2<-contrasts.fit(fit, cont.dif)
>fit2<-eBayes(fit2)
>toptable(fit2)
>
>
>Now, what i've just done up there is compare two groups that are on 
>the same arrays, but the p-values and gene IDs I get after lmscFit 
>are very dissimilar to the ones I get if I just run a simple limma 
>analysis across JUST the three two channel chips representing 
>Disease_24_1mg/Normal_24_1mg.
>
>I can't understand why there is such a complete difference between 
>what one gets with a fit across three two-color microarrays (Cy3 vs 
>Cy5) and when comparing the same two channels that were once 
>represented on three single chips, together after the single-channel fit.
>
>Is my model wrong? Can someone point me to an answer? Thank you so 
>much for your help.
>
>
>
>
>
>---
>Deanne Taylor PhD
>Department of Biostatistics
>Harvard School of Public Health
>655 Huntington Avenue
>Boston, MA 02115
>dtaylor at hsph.harvard.edu
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111