[BioC] loess and limma
Sean Davis
sdavis2 at mail.nih.gov
Thu Sep 14 14:33:46 CEST 2006
On Thursday 14 September 2006 08:02, john seers (IFR) wrote:
> Hello Anyone
>
> I was just experimenting with some made up data in limma and could not
> understand the results of the "normalisation within array" using loess
> smoothing. Can anybody explain for me why the results are completely
> smoothed despite there being a strongly upregulated gene?
>
> My first reaction was that it was because there were too few points for
> loess to handle it sensibly. But would you expect it to completely
> smooth the results?
>
> Here is my artificial data with the columns in 598new and 600new
> identical and with the gene in row 12 with large signal values.
>
> > RG$R
>
> 598new 599new 600new 617new 621new 637new
> [1,] 310 280 310 321 290 291
> [2,] 290 285 290 311 291 292
> [3,] 295 290 295 291 301 293
> [4,] 298 295 298 281 311 294
> [5,] 301 301 301 292 321 295
> [6,] 302 305 302 302 331 296
> [7,] 303 294 303 313 281 297
> [8,] 304 293 304 324 291 298
> [9,] 305 311 305 334 303 321
> [10,] 306 312 306 314 304 322
> [11,] 307 313 307 312 312 323
> [12,] 4800 314 4800 287 324 324
>
> > RG$G
>
> 598new 599new 600new 617new 621new 637new
> [1,] 301 320 301 330 280 311
> [2,] 299 319 299 335 290 321
> [3,] 302 280 302 295 300 291
> [4,] 298 281 298 280 310 293
> [5,] 303 282 303 285 320 295
> [6,] 297 284 297 287 330 285
> [7,] 304 310 304 289 335 321
> [8,] 296 311 296 298 290 331
> [9,] 305 319 305 284 320 297
> [10,] 295 322 295 290 321 296
> [11,] 308 317 308 321 324 295
> [12,] 301 325 301 324 325 294
>
> > RG$Rb
>
> 598new 599new 600new 617new 621new 637new
> [1,] 30 30 30 30 30 30
> [2,] 30 30 30 30 30 30
> [3,] 30 30 30 30 30 30
> [4,] 30 30 30 30 30 30
> [5,] 30 30 30 30 30 30
> [6,] 30 30 30 30 30 30
> [7,] 30 30 30 30 30 30
> [8,] 30 30 30 30 30 30
> [9,] 30 30 30 30 30 30
> [10,] 30 30 30 30 30 30
> [11,] 30 30 30 30 30 30
> [12,] 30 30 30 30 30 30
>
> > RG$Gb
>
> 598new 599new 600new 617new 621new 637new
> [1,] 30 30 30 30 30 30
> [2,] 30 30 30 30 30 30
> [3,] 30 30 30 30 30 30
> [4,] 30 30 30 30 30 30
> [5,] 30 30 30 30 30 30
> [6,] 30 30 30 30 30 30
> [7,] 30 30 30 30 30 30
> [8,] 30 30 30 30 30 30
> [9,] 30 30 30 30 30 30
> [10,] 30 30 30 30 30 30
> [11,] 30 30 30 30 30 30
> [12,] 30 30 30 30 30 30
>
>
>
> After normalising:
>
> # Normalise within arrays
> MA.nwa <- normalizeWithinArrays(RG, method="loess")
>
> what looks like completely smoothed values. Any effects of the one
> upregulated gene are gone. i.e. everything is very close to 1.
>
> > 2^MA.nwa$M
>
> 598new 599new 600new 617new 621new 637new
> [1,] 1.0369680 0.9770115 1.0369680 1.0000000 1.0000000 1.0000000
> [2,] 1.0000000 1.0000000 1.0000000 0.9498056 1.0000000 0.8084718
> [3,] 0.9675804 1.0000000 0.9675804 1.0000000 1.0000000 1.0000000
> [4,] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0038023
> [5,] 0.9775535 1.0000000 0.9775535 1.0000000 1.0461482 1.0000000
> [6,] 0.9971243 1.0000000 0.9971243 1.0745859 1.0067460 1.0000000
> [7,] 1.0000000 1.0036845 1.0000000 1.0000000 0.8200222 0.8418535
> [8,] 1.0012844 0.9963297 1.0012844 1.0000000 1.0000000 1.0000000
> [9,] 1.0036431 0.9930062 1.0036431 1.0910065 0.9380292 1.0000000
> [10,] 1.0055536 0.9999995 1.0055536 0.9996738 1.0000000 1.0005323
> [11,] 1.0000000 1.0070430 1.0000000 0.8833719 1.0000000 1.0005324
> [12,] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
>
>
> Just for comparison if I do a different type of normalisation you can
>
> see the upregulated gene standing out:
> > MA.nwa <- normalizeWithinArrays(RG, method="median")
> >
> >
> > 2^MA.nwa$M
>
> 598new 599new 600new 617new 621new 637new
> [1,] 1.0332103 0.8896202 1.0332103 0.9550461 1.0364855 0.9235331
> [2,] 0.9665428 0.9105525 0.9665428 0.9071081 1.0004538 0.8952134
> [3,] 0.9742647 1.0732378 0.9742647 0.9697219 1.0003118 1.0019211
> [4,] 1.0000000 1.0895189 1.0000000 0.9885219 1.0001800 0.9980826
> [5,] 0.9926740 1.1097659 0.9926740 1.0116114 1.0000573 0.9943019
> [6,] 1.0187266 1.1172789 1.0187266 1.0420495 0.9999427 1.0371934
> [7,] 0.9963504 0.9729903 0.9963504 1.0758191 0.8201698 0.9122977
> [8,] 1.0300752 0.9658553 1.0300752 1.0801029 1.0004538 0.8852921
> [9,] 1.0000000 1.0033931 1.0000000 1.1783992 0.9381981 1.0836774
> [10,] 1.0415094 0.9966184 1.0415094 1.0754682 0.9383988 1.0914894
> [11,] 0.9964029 1.0175767 0.9964029 0.9541325 0.9559423 1.0993603
> [12,] 17.6014760 0.9934796 17.6014760 0.8606734 0.9932423 1.1072908
>
>
>
> Is this just because there are too few points? If so how do you
> determine how many points are needed to make a loess normalisation
> valid?
>From the loess help page:
Fitting is done locally. That is, for the fit at point x, the fit
is made using points in a neighbourhood of x, weighted by their
distance from x (with differences in 'parametric' variables being
ignored when computing the distance). The size of the
neighbourhood is controlled by alpha (set by 'span' or
'enp.target'). For alpha < 1, the neighbourhood includes
proportion alpha of the points, and these have tricubic weighting
(proportional to (1 - (dist/maxdist)^3)^3. For alpha > 1, all
points are used, with the 'maximum distance' assumed to be
alpha^1/p times the actual maximum distance for p explanatory
variables.
Notice that the "span" parameter controls the amount of data that is used in
the local fit. Also, note that the span is measured in the "A" axis of the
MA plot. The default span=0.3 in limma; looking at the formula for the
weighting above one can see that the weights fall off with distance from the
point in question. For your data, you have one point that has a quite high
intensity in the red channel, leading to a very large A value. Since the
loess normalization aims to fit a smoothed line and then bring that line to
zero, the line at the differentially-expressed gene goes through that point,
since there is little other data included in the fit, and the resulting
normalized value is zero (or a ratio of one).
I don't find loess to be problematic for "real" data, unless of course, it is
applied where there are EXPECTED to be trends of ratio with average
intensity, such as with CGH data. You could increase the value of span to
see how it affects your results.
Sean
More information about the Bioconductor
mailing list