[BioC] lowess vs. loess [marrayNorm and limma]

Wed Sep 3 13:21:49 MEST 2003

To follow up on my general remarks on lowess and loess, I should also 
explain the slight differences between loess normalization in marrayNorm 
and limma.

I am the one mostly to blame for the fact that marrayNorm and limma are not 
exactly the same for loess normalization. Jean and I co-ordinated 
marrayNorm and limma earlier in the year to use:

span=0.4
degree=1
iterations=5
surface="direct"

(See my last post to BioC for the meaning of these parameters.) These 
parameter setting are conservative choices. They result in a relatively 
stiff curve, with a high degree of robustness and with exact loess 
calculations involving no interpolation. The decision to avoid 
interpolation was motivated more by the desire to avoid confusing warning 
messages from 'loess' rather than because interpolation is not accurate.

As some users have noted on this mailing list, the avoidance of 
interpolation results in very, very slow fits for some data sets. It was 
much, much too slow for me anyway. So I re-introduced interpolation to 
limma and have implemented some warning suppression at a lower code level 
to avoid the confusing warning messages. limma currently uses default values:

span=0.3
degree=1
iterations=4
interpolation: 'lowess'-style interpolation where possible, otherwise 
'loess'-style

The default values in limma agree exactly with the earlier software SMA, 
i.e., with the software that was used for the original papers on loess 
normalization. If you want limma to produce the slightly stiffer, slightly 
more robust curves produced by marrayNorm, you can use

normalizeWithinArrays(RG, span=0.4, iterations=5)

The only difference between limma and marrayNorm will then be a result of 
interpolation used by limma.

Which parameter settings are best? Data analysis is not such a precise 
science that it is possible to give categorical answers. Either span=0.3 or 
span=0.4 are acceptable. In general, a higher value for span is appropriate 
if your data doesn't show much intensity-dependence in dye-bias and 
vice-versa. Iterations=4 produces a reasonably robust fit. If you 
desperately need a very robust procedure, perhaps because you have a very 
high proportion of differentially expressed genes, then the most robust 
possible print-tip intensity-based normalization procedure is available from

normalizeWithinArrays(RG, method="robustspline", robust="MM")

Gordon

At 12:28 AM 3/09/2003, nataraja at mit.edu wrote:
>Hello!  I have noticed a distinction being made
>between lowess and loess for the normalization
>of microarray data, but I'm not quite clear about
>what the difference is between the two techniques.
> >From William Cleveland's website, it seems that the
>major difference is that lowess uses only one
>predictor variable, whereas loess can be used with
>more than one predictor:
>http://cm.bell-labs.com/cm/ms/departments/sia/wsc/smoothsoft.html
>For intensity-based normalization (one predictor) wouldn't
>the two algorithms boil down to the same thing?
>Any insight would be greatly appreciated!!
>
>Thank you,
>Sripriya Natarajan
>Graduate Student, Center for Vascular Biology
>Dept. of Pathology, Brigham and Women's Hospital