[BioC] Opinions on array design, normalization, and linear modeling with LIMMA

Jianping Jin jjin at email.unc.edu
Thu Nov 1 17:46:54 CET 2007


Hi Yong,

I have never seen a MA plot with such wide spread spots. It may be caused 
by its real biology or technique artifacts. My suggestion is to do more 
data quality assessment, such as "plotDensities". Dye swap labeling or 
using a common reference RNA may help to confirm the difference or 
problems.

JJ-

--On Thursday, November 01, 2007 11:13 AM -0500 Yong Yin 
<yyin at watson.wustl.edu> wrote:

> Dear list,
>
>
> I think I need to simplify my question.
>
>
> I have two samples, each from a time point of its embryogenesis. They are
> applied on a two-color Agilent array to compare between each other.
>
>
> The raw data has a MA-plot like this:
>
>
>
> ftp://genome.wustl.edu/private/272205387781472/yong_data.071031/MA_RGLW1.
> pdf
>
>
> After "normexp" and global loess, the MA-plot does change it's shape as
> seen here:
>
>
>
> ftp://genome.wustl.edu/private/272205387781472/yong_data.071031/MA_MALWC1
> .pdf
>
>
> My 1st question:
>
>
> Does my data have too much differential expression, according to your
> experience?
>
>
> Apparently, Jianping thinks so.
>
>
> Then my 2nd question:
>
>
> Is it still ok to use global loess for normalization?
>
>
> Thanks so much, I need your opinions.
>
>
> I am running the latest R and all packages. Commands I used are:
>
>
>
>> RGLWC <- backgroundCorrect(RGLW, method="normexp", offset=50)
>
>> MALWC <- normalizeWithinArrays(RGLWC, method="loess")
>
>
>
>
> Best,
>
>
> Yong
>
>
>
>
> On Nov 1, 2007, at 8:47 AM, Jianping Jin wrote:
>
>
> Yong,
>
>
> What is your reference sample(s) for this test run? Looks like the
> experiment and reference samples are quite different.
>
>
> JJ-
>
>
> --On Wednesday, October 31, 2007 4:40 PM -0500 Yong Yin
> <yyin at watson.wustl.edu> wrote:
>
>
>
>
> Dear list,
>
>
> I am new to BioConductor, so please forgive me if my questions are
> naive to you.
>
>
> We designed an Agilent 4x44k array, with the same 44K probes printed
> 4 times in the 4 blocks. These 44K probes are designed based on a low-
> coverage genome sequencing project for a parasitic nematode. Our
> purpose is to investigate gene expression during early embryogenesis
> of the nematode.
>
>
> We have received results from a test run to evaluate the array
> quality. Samples applied on the chip were from two time points during
> the nematode embryogenesis. As a experiment, I have been following
> the LIMMA manual step-by-step, treating the results as a simple two-
> sample comparison with both technical and biological replication. I
> have uploaded 3 images in the following location and would love to
> hear what you folks think:
>
>
> ftp://genome.wustl.edu/private/272205387781472/yong_data.071031/
>
>
> The general quality of the array is very good, I can't find any
> indication of quality problem. The file "MA_RGLW1.pdf" is a MA plot
> of raw RG data for one of the 4 blocks. After background correction
> with "normexp" and within-array normalization with global loess, its
> MA plot is shown as in "MA_MALWC1.pdf".
>
>
> Given that we are studying early embryogenesis, we should expect that
> a lot of genes are differentially expressed at these two time points.
> In the MA plots, I think we indeed see lots of DE.  However,
> according to what I read, the underline assumption for such
> normalization is that the majority of the genes under investigation
> should not be differentially expressed. I also read from other
> people's posts that I should keep the normalization as simple as
> possible and the "good" data will always be good.
>
>
>  From my MA plots, do you think my normalization is reasonable with
> this data? If not, are there suggestions what to do? a different
> normalization method? or even change the design of the array with a
> set of spike-in control probes to use for normalization?
>
>
> The two time points in this test run are actually the beginning and
> the ending points of the developmental stages that we are planning to
> investigate. We are considering to use a pooled-sample as a common
> reference. We hope a pooled reference like this will decrease the
> degrees of differential expression between any two samples of our
> study. Does this sound like a good idea?
>
>
> After normalization with loess, I went ahead to the step of linear
> modeling with eBayes and got the following QQ plot:
> "QQPlot_fitLWC2eBayes.pdf'.
>
>
> Does the modeling look reasonable, according to your experience?
>
>
> Any opinions and advices are greatly appreciated.
>
>
> Best,
>
>
> Yong Yin, Ph.D.
>
>
> Senior Scientist
> Genome Sequencing Center
> Washington University School of Medicine, Campus box 8501
> 4444 Forest Park
> Saint Louis, MO 63108
>
>
> Tel: (314) 286-1415
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
>
>
>
>
> ##################################
> Jianping Jin Ph.D.
> Bioinformatics scientist
> Center for Bioinformatics
> Room 3133 Bioinformatics building
> CB# 7104
> University of Chapel Hill
> Chapel Hill, NC 27599
> Phone: (919)843-6105
> FAX:   (919)843-3103
> E-Mail: jjin at email.unc.edu
>
>
>
>
>
>
> Yong Yin, Ph.D.
>
>
> Senior Scientist
> Genome Sequencing Center
> Washington University School of Medicine, Campus box 8501
> 4444 Forest Park
> Saint Louis, MO 63108
>
>
> Tel: (314) 286-1415
>



##################################
Jianping Jin Ph.D.
Bioinformatics scientist
Center for Bioinformatics
Room 3133 Bioinformatics building
CB# 7104
University of Chapel Hill
Chapel Hill, NC 27599
Phone: (919)843-6105
FAX:   (919)843-3103
E-Mail: jjin at email.unc.edu



More information about the Bioconductor mailing list