[BioC] Problem with gcrma

Fri Jun 4 16:28:40 CEST 2010

Hi Casper.

Casper Shyr wrote:
> Hello all,   I am having trouble normalizing with gcrma package.   I start off with 4 Affy chip CEL files. I read them in, and then performed gcrma with default parameter. Then I made a boxplot of the normalized data. The result of the plot is tail underneath the 1st quartile is either very short or completely missing. The median for each assay, although lined up, is very close to the 1st quartile. Judging from the nature of the data, I know this is wrong. I also tested on the example Dilution data, and got the similar result as well (i.e. lower tail absent).
>   My code is simply:DataWTPBS <-
> ReadAffy(celfile.path="data/WTPBS/");
> WTPBSgcRMA <-
> gcrma(DataWTPBS);
> boxplot(exprs(WTPBSgcRMA));   I've checked if any of my packages need to be updated. I also made the boxplot of unnormalized data and it looked fine.
>   Any suggestion on why this might be the case?

This isn't a problem with gcrma(); in fact it is expected. What both RMA 
and GCRMA are trying to do is subtract background from the data without 
unduly affecting the data from truly expressed genes. Rather than using 
boxplots, it might be instructive for you to look at density plots.

As an example using the Dilution data set:

library(gcrma)
library(affydata)
data(Dilution)
## ExpressionSet without background correction
norm <- normalize(Dilution, "quantiles")
eset.nobg <- computeExprSet(norm, summary.method = "medianpolish",
			pmcorrect.method = "pmonly")
eset.rma <- rma(Dilution)
eset.gcrma <- gcrma(Dilution)

plot(density(exprs(eset.nobg)[,1]), xlim = c(1,14))
lines(density(exprs(eset.rma)[,1]), lty=2)
lines(density(exprs(eset.gcrma)[,1]), lty=3)

You can see here that the data without background correction is pretty 
much a single peak, where it is difficult to distinguish truly expressed 
data from background. The RMA data (dashed line) looks semi-bimodal (and 
usually looks better than this), with some differentiation between 
background and expressed data. The GCRMA data has a clear separation 
between data that are assumed to be from unexpressed genes, and data 
from expressed genes.

Best,

Jim

> Thank you!Sincerely,Casper
> University of British Columbia 		 	   		  
> _________________________________________________________________
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues