[BioC] Problem with gcrma
James W. MacDonald
jmacdon at med.umich.edu
Fri Jun 4 16:28:40 CEST 2010
Hi Casper.
Casper Shyr wrote:
> Hello all, I am having trouble normalizing with gcrma package. I start off with 4 Affy chip CEL files. I read them in, and then performed gcrma with default parameter. Then I made a boxplot of the normalized data. The result of the plot is tail underneath the 1st quartile is either very short or completely missing. The median for each assay, although lined up, is very close to the 1st quartile. Judging from the nature of the data, I know this is wrong. I also tested on the example Dilution data, and got the similar result as well (i.e. lower tail absent).
> My code is simply:DataWTPBS <-
> ReadAffy(celfile.path="data/WTPBS/");
> WTPBSgcRMA <-
> gcrma(DataWTPBS);
> boxplot(exprs(WTPBSgcRMA)); I've checked if any of my packages need to be updated. I also made the boxplot of unnormalized data and it looked fine.
> Any suggestion on why this might be the case?
This isn't a problem with gcrma(); in fact it is expected. What both RMA
and GCRMA are trying to do is subtract background from the data without
unduly affecting the data from truly expressed genes. Rather than using
boxplots, it might be instructive for you to look at density plots.
As an example using the Dilution data set:
library(gcrma)
library(affydata)
data(Dilution)
## ExpressionSet without background correction
norm <- normalize(Dilution, "quantiles")
eset.nobg <- computeExprSet(norm, summary.method = "medianpolish",
pmcorrect.method = "pmonly")
eset.rma <- rma(Dilution)
eset.gcrma <- gcrma(Dilution)
plot(density(exprs(eset.nobg)[,1]), xlim = c(1,14))
lines(density(exprs(eset.rma)[,1]), lty=2)
lines(density(exprs(eset.gcrma)[,1]), lty=3)
You can see here that the data without background correction is pretty
much a single peak, where it is difficult to distinguish truly expressed
data from background. The RMA data (dashed line) looks semi-bimodal (and
usually looks better than this), with some differentiation between
background and expressed data. The GCRMA data has a clear separation
between data that are assumed to be from unexpressed genes, and data
from expressed genes.
Best,
Jim
> Thank you!Sincerely,Casper
> University of British Columbia
> _________________________________________________________________
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list