[BioC] RMA + loess normalisation and filtering
Wolfgang Huber
huber at ebi.ac.uk
Tue Apr 19 17:48:26 CEST 2005
Hi Katleen,
> question 1: I have performed *RMA normalisation *of my Affymetrix data.
> However, for further analysis I think it is necessary to *filter* the
> data (non-expressed genes or below background). However I don't know the
> best way to filter the genes that are not expressed or very low
> expressed (below the background), based on the RMA normalisation data.
My preference is to select genes based on their overall variability,
using a criterions such as
z = apply(exprs(x), 1, IQR)
(see als rowQ from Biobase-devel, or rowSds from the vsn package). The
rationale is that it is difficult to decide on an absolute number that
corresponds to "present" or "absent" (e.g. due to different AT-content),
but if the values vary across the experiment there is some hope this is
really detecting a transcript. I have no good suggestion on deciding a
threshold though - I'd usually take the top 50% or alike, depending on
chip type, and how the histogram of "z" looks.
> question 2: In a paper of Choe et al (2005, Genome Biology) I have read
> that *loess normalisation *after the first normalisation step is
> important in order to detect most true positive differentially expressed
> genes. However when I perform
> />normdatabis<-normalize.exprSet.loess(RMAdata,transfn="antilog")/
> following warnings appear: /k-d tree limited by memory ncmax=5002/
> I guess that the loess normalization was only based on the 5002 first
> probe set id's or what does this mean?
> Is it ok or do I need to follow another strategy for the second loess
> normalisation step?
I don't think combining multiple normalization steps in this way is
appropriate. RMA is a model-based normalization method and the results
from it should be fine as is. It they aren't, then the model does not
fit -- which means that either you have a data quality problem or you
shouldn't use RMA in the first place.
Also, with so much normalization you are likely not just to remove
technical variations but also biological signal, hence, to find *less*
differentially expresse genes.
Best regards
Wolfgang
-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax: +44 1223 494486
Http: www.ebi.ac.uk/huber
More information about the Bioconductor
mailing list