[BioC] Limma: questions about data pre-processing

Tue Feb 7 14:25:10 CET 2012

Dear limma experts

During creating the pipe-line for dissecting differential gene expression  
in frame of limma,
several questions have arose.

Experiment:
I have 62 two-color Agilent human arrays.
The samples are from several human more or less related to each other  
disorders and vary in age, sex, disease duration and diagnosis.
Company that made hybridizations performed all hybs in one direction (no  
dye-swaps),
where all samples were in G channel and common Ref in R channel,
and unfortunately provided us only excepts of Feature Extraction
which contained info on G, Gb, R, Rb, and FNO (non-uniformity outliers)  
and separate gene annotation table.

I performed generic import of the data and assigned zero-weight to the FNO  
spots:
I analyzed density and MA-plots, box-plots of M-values, G and R channels  
and box-plots of background intensities,
and removed from experiment 1 array with aberrant raw G-channel density.
(I will discuss experiment description later, when come to the linear  
model)

Q1: Is there a rationale of down-weighting FNO (around 100-200 spots per  
array) for background correction and further normalization?
Q2: Is there way to make image representation of Agilent microarray (for  
each channel and backgrounds)?
     In another words is there known 'layout' for human 44K Agilent?

Next I corrected the background with:
> RG.b <- backgroundCorrect(RG.raw, method="minimum", offset=50)
(recommended method=normexp produced shifted curves for five arrays after  
taking a look on density plots,
and box-plots for separate G and R channels also look less uniform as  
compared with 'minimum' method)

Q3: I guess it is also possible to remove those 5 arrays from the  
experiment. Is it fair?
Q4: What kind of reasoning should be used for the choice between  
background subtraction methods?

Then performed standard loess within array normalization:
> MA.loess <- normalizeWithinArrays(RG.b, method="loess",bc.method="none")

Q5: Do I need to perform between array normalization?
     How to judge which of the methods (non, scale, quantile, Aquantile) is  
best for my experiment?

For now I decide to stuck with background=minimum, within=loess, and  
between=is under the question

Next I would like to ask questions about
linear model of my experiment, but I will make it in a next help request

Thanks a lot in advance

and finally
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252
[3] LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C
[5] LC_TIME=Dutch_Netherlands.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] limma_3.10.2
>

With kind regards
Vladimir
--