[BioC] loess and limma

Thu Sep 14 14:33:46 CEST 2006

On Thursday 14 September 2006 08:02, john seers (IFR) wrote:
> Hello Anyone
>
> I was just experimenting with some made up data in limma and could not
> understand the results of the "normalisation within array" using loess
> smoothing. Can anybody explain for me why the results are completely
> smoothed despite there being a strongly upregulated gene?
>
> My first reaction was that it was because there were too few points for
> loess to handle it sensibly. But would you expect it to completely
> smooth the results?
>
> Here is my artificial data with the columns in 598new and 600new
> identical and with the gene in row 12 with large signal values.
>
> > RG$R
>
>       598new 599new 600new 617new 621new 637new
>  [1,]    310    280    310    321    290    291
>  [2,]    290    285    290    311    291    292
>  [3,]    295    290    295    291    301    293
>  [4,]    298    295    298    281    311    294
>  [5,]    301    301    301    292    321    295
>  [6,]    302    305    302    302    331    296
>  [7,]    303    294    303    313    281    297
>  [8,]    304    293    304    324    291    298
>  [9,]    305    311    305    334    303    321
> [10,]    306    312    306    314    304    322
> [11,]    307    313    307    312    312    323
> [12,]   4800    314   4800    287    324    324
>
> > RG$G
>
>       598new 599new 600new 617new 621new 637new
>  [1,]    301    320    301    330    280    311
>  [2,]    299    319    299    335    290    321
>  [3,]    302    280    302    295    300    291
>  [4,]    298    281    298    280    310    293
>  [5,]    303    282    303    285    320    295
>  [6,]    297    284    297    287    330    285
>  [7,]    304    310    304    289    335    321
>  [8,]    296    311    296    298    290    331
>  [9,]    305    319    305    284    320    297
> [10,]    295    322    295    290    321    296
> [11,]    308    317    308    321    324    295
> [12,]    301    325    301    324    325    294
>
>  > RG$Rb
>
>       598new 599new 600new 617new 621new 637new
>  [1,]     30     30     30     30     30     30
>  [2,]     30     30     30     30     30     30
>  [3,]     30     30     30     30     30     30
>  [4,]     30     30     30     30     30     30
>  [5,]     30     30     30     30     30     30
>  [6,]     30     30     30     30     30     30
>  [7,]     30     30     30     30     30     30
>  [8,]     30     30     30     30     30     30
>  [9,]     30     30     30     30     30     30
> [10,]     30     30     30     30     30     30
> [11,]     30     30     30     30     30     30
> [12,]     30     30     30     30     30     30
>
> > RG$Gb
>
>       598new 599new 600new 617new 621new 637new
>  [1,]     30     30     30     30     30     30
>  [2,]     30     30     30     30     30     30
>  [3,]     30     30     30     30     30     30
>  [4,]     30     30     30     30     30     30
>  [5,]     30     30     30     30     30     30
>  [6,]     30     30     30     30     30     30
>  [7,]     30     30     30     30     30     30
>  [8,]     30     30     30     30     30     30
>  [9,]     30     30     30     30     30     30
> [10,]     30     30     30     30     30     30
> [11,]     30     30     30     30     30     30
> [12,]     30     30     30     30     30     30
>
>
>
> After normalising:
>
> # Normalise within arrays
> MA.nwa <- normalizeWithinArrays(RG, method="loess")
>
> what looks like completely smoothed values. Any effects of the one
> upregulated gene are gone. i.e. everything is very close to 1.
>
> > 2^MA.nwa$M
>
>          598new    599new    600new    617new    621new    637new
>  [1,] 1.0369680 0.9770115 1.0369680 1.0000000 1.0000000 1.0000000
>  [2,] 1.0000000 1.0000000 1.0000000 0.9498056 1.0000000 0.8084718
>  [3,] 0.9675804 1.0000000 0.9675804 1.0000000 1.0000000 1.0000000
>  [4,] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0038023
>  [5,] 0.9775535 1.0000000 0.9775535 1.0000000 1.0461482 1.0000000
>  [6,] 0.9971243 1.0000000 0.9971243 1.0745859 1.0067460 1.0000000
>  [7,] 1.0000000 1.0036845 1.0000000 1.0000000 0.8200222 0.8418535
>  [8,] 1.0012844 0.9963297 1.0012844 1.0000000 1.0000000 1.0000000
>  [9,] 1.0036431 0.9930062 1.0036431 1.0910065 0.9380292 1.0000000
> [10,] 1.0055536 0.9999995 1.0055536 0.9996738 1.0000000 1.0005323
> [11,] 1.0000000 1.0070430 1.0000000 0.8833719 1.0000000 1.0005324
> [12,] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
>
>
> Just for comparison if I do a different type of normalisation you can
>
> see the upregulated gene standing out:
> > MA.nwa <- normalizeWithinArrays(RG, method="median")
> >
> >
> > 2^MA.nwa$M
>
>           598new    599new     600new    617new    621new    637new
>  [1,]  1.0332103 0.8896202  1.0332103 0.9550461 1.0364855 0.9235331
>  [2,]  0.9665428 0.9105525  0.9665428 0.9071081 1.0004538 0.8952134
>  [3,]  0.9742647 1.0732378  0.9742647 0.9697219 1.0003118 1.0019211
>  [4,]  1.0000000 1.0895189  1.0000000 0.9885219 1.0001800 0.9980826
>  [5,]  0.9926740 1.1097659  0.9926740 1.0116114 1.0000573 0.9943019
>  [6,]  1.0187266 1.1172789  1.0187266 1.0420495 0.9999427 1.0371934
>  [7,]  0.9963504 0.9729903  0.9963504 1.0758191 0.8201698 0.9122977
>  [8,]  1.0300752 0.9658553  1.0300752 1.0801029 1.0004538 0.8852921
>  [9,]  1.0000000 1.0033931  1.0000000 1.1783992 0.9381981 1.0836774
> [10,]  1.0415094 0.9966184  1.0415094 1.0754682 0.9383988 1.0914894
> [11,]  0.9964029 1.0175767  0.9964029 0.9541325 0.9559423 1.0993603
> [12,] 17.6014760 0.9934796 17.6014760 0.8606734 0.9932423 1.1072908
>
>
>
> Is this just because there are too few points? If so how do you
> determine how many points are needed to make a loess normalisation
> valid?

>From the loess help page:

     Fitting is done locally.  That is, for the fit at point x, the fit
     is made using points in a neighbourhood of x, weighted by their
     distance from x (with differences in 'parametric' variables being
     ignored when computing the distance). The size of the
     neighbourhood is controlled by alpha (set by 'span' or
     'enp.target').  For alpha < 1, the neighbourhood includes
     proportion alpha of the points, and these have tricubic weighting
     (proportional to (1 - (dist/maxdist)^3)^3. For alpha > 1, all
     points are used, with the 'maximum distance' assumed to be
     alpha^1/p times the actual maximum distance for p explanatory
     variables.

Notice that the "span" parameter controls the amount of data that is used in 
the local fit.  Also, note that the span is measured in the "A" axis of the 
MA plot.  The default span=0.3 in limma; looking at the formula for the 
weighting above one can see that the weights fall off with distance from the 
point in question.  For your data, you have one point that has a quite high 
intensity in the red channel, leading to a very large A value.  Since the 
loess normalization aims to fit a smoothed line and then bring that line to 
zero, the line at the differentially-expressed gene goes through that point, 
since there is little other data included in the fit, and the resulting 
normalized value is zero (or a ratio of one).  

I don't find loess to be problematic for "real" data, unless of course, it is 
applied where there are EXPECTED to be trends of ratio with average 
intensity, such as with CGH data.  You could increase the value of span to 
see how it affects your results.

Sean