[BioC] Limma: questions about data pre-processing

axel.klenk at actelion.com axel.klenk at actelion.com
Tue Feb 7 15:32:03 CET 2012


Dear Vladimir,

I'll only answer or comment on some of your questions and leave
the others for the true experts...

Q2: yes, for example using package arrayQualityMetrics, if you know
the array layout. FES output usually contains columns Col and Row for 
spot coordinates but apparently your "service provider" has removed
them. I could send you a coordinates <--> oligo mapping by email if you
can tell me your array type -- is it 1x44K, 4x44K or 4x44Kv2? 
Alternatively,
you can try to find that information on Agilent's eArray web site:
earray.chem.agilent.com

Q5: for a common reference design, dye swaps are not required and 
I would not apply a loess normalization -- depending on what you have
hybridized as the common reference, the assumptions may not hold.
As for the between-array normalization, Rquantile may also be an 
option for your design and boxplots and density plots may be used 
for judging the results.

Cheers,

 - axel


Axel Klenk
Research Informatician
Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / 
Switzerland




From:
"Vladimir Krasikov" <v.v.krasikov at gmail.com>
To:
bioconductor at r-project.org
Date:
07.02.2012 14:27
Subject:
[BioC] Limma: questions about data pre-processing
Sent by:
bioconductor-bounces at r-project.org



Dear limma experts

During creating the pipe-line for dissecting differential gene expression 
in frame of limma,
several questions have arose.

Experiment:
I have 62 two-color Agilent human arrays.
The samples are from several human more or less related to each other 
disorders and vary in age, sex, disease duration and diagnosis.
Company that made hybridizations performed all hybs in one direction (no 
dye-swaps),
where all samples were in G channel and common Ref in R channel,
and unfortunately provided us only excepts of Feature Extraction
which contained info on G, Gb, R, Rb, and FNO (non-uniformity outliers) 
and separate gene annotation table.

I performed generic import of the data and assigned zero-weight to the FNO 
 
spots:
I analyzed density and MA-plots, box-plots of M-values, G and R channels 
and box-plots of background intensities,
and removed from experiment 1 array with aberrant raw G-channel density.
(I will discuss experiment description later, when come to the linear 
model)

Q1: Is there a rationale of down-weighting FNO (around 100-200 spots per 
array) for background correction and further normalization?
Q2: Is there way to make image representation of Agilent microarray (for 
each channel and backgrounds)?
     In another words is there known 'layout' for human 44K Agilent?

Next I corrected the background with:
> RG.b <- backgroundCorrect(RG.raw, method="minimum", offset=50)
(recommended method=normexp produced shifted curves for five arrays after 
taking a look on density plots,
and box-plots for separate G and R channels also look less uniform as 
compared with 'minimum' method)

Q3: I guess it is also possible to remove those 5 arrays from the 
experiment. Is it fair?
Q4: What kind of reasoning should be used for the choice between 
background subtraction methods?

Then performed standard loess within array normalization:
> MA.loess <- normalizeWithinArrays(RG.b, method="loess",bc.method="none")

Q5: Do I need to perform between array normalization?
     How to judge which of the methods (non, scale, quantile, Aquantile) 
is 
best for my experiment?

For now I decide to stuck with background=minimum, within=loess, and 
between=is under the question

Next I would like to ask questions about
linear model of my experiment, but I will make it in a next help request

Thanks a lot in advance

and finally
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252
[3] LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C
[5] LC_TIME=Dutch_Netherlands.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] limma_3.10.2
>

With kind regards
Vladimir
--

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor




The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged.
It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email.
The content of this email is not legally binding unless confirmed by letter.
Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com



More information about the Bioconductor mailing list