[BioC] lowess vs. loess [marrayNorm and limma]
Gordon Smyth
smyth at wehi.edu.au
Wed Sep 3 13:21:49 MEST 2003
To follow up on my general remarks on lowess and loess, I should also
explain the slight differences between loess normalization in marrayNorm
and limma.
I am the one mostly to blame for the fact that marrayNorm and limma are not
exactly the same for loess normalization. Jean and I co-ordinated
marrayNorm and limma earlier in the year to use:
span=0.4
degree=1
iterations=5
surface="direct"
(See my last post to BioC for the meaning of these parameters.) These
parameter setting are conservative choices. They result in a relatively
stiff curve, with a high degree of robustness and with exact loess
calculations involving no interpolation. The decision to avoid
interpolation was motivated more by the desire to avoid confusing warning
messages from 'loess' rather than because interpolation is not accurate.
As some users have noted on this mailing list, the avoidance of
interpolation results in very, very slow fits for some data sets. It was
much, much too slow for me anyway. So I re-introduced interpolation to
limma and have implemented some warning suppression at a lower code level
to avoid the confusing warning messages. limma currently uses default values:
span=0.3
degree=1
iterations=4
interpolation: 'lowess'-style interpolation where possible, otherwise
'loess'-style
The default values in limma agree exactly with the earlier software SMA,
i.e., with the software that was used for the original papers on loess
normalization. If you want limma to produce the slightly stiffer, slightly
more robust curves produced by marrayNorm, you can use
normalizeWithinArrays(RG, span=0.4, iterations=5)
The only difference between limma and marrayNorm will then be a result of
interpolation used by limma.
Which parameter settings are best? Data analysis is not such a precise
science that it is possible to give categorical answers. Either span=0.3 or
span=0.4 are acceptable. In general, a higher value for span is appropriate
if your data doesn't show much intensity-dependence in dye-bias and
vice-versa. Iterations=4 produces a reasonably robust fit. If you
desperately need a very robust procedure, perhaps because you have a very
high proportion of differentially expressed genes, then the most robust
possible print-tip intensity-based normalization procedure is available from
normalizeWithinArrays(RG, method="robustspline", robust="MM")
Gordon
At 12:28 AM 3/09/2003, nataraja at mit.edu wrote:
>Hello! I have noticed a distinction being made
>between lowess and loess for the normalization
>of microarray data, but I'm not quite clear about
>what the difference is between the two techniques.
> >From William Cleveland's website, it seems that the
>major difference is that lowess uses only one
>predictor variable, whereas loess can be used with
>more than one predictor:
>http://cm.bell-labs.com/cm/ms/departments/sia/wsc/smoothsoft.html
>For intensity-based normalization (one predictor) wouldn't
>the two algorithms boil down to the same thing?
>Any insight would be greatly appreciated!!
>
>Thank you,
>Sripriya Natarajan
>Graduate Student, Center for Vascular Biology
>Dept. of Pathology, Brigham and Women's Hospital
More information about the Bioconductor
mailing list