[BioC] problems about cDNA vs genomic arrays normalization
Jenny Drnevich
drnevich at uiuc.edu
Mon Nov 20 19:37:37 CET 2006
Hi Yanju,
>After reading your explanation, I still have 2 puzzles.
>1. Before I also applied normalizeWithinArrays() method to this
>dataset. Do you think it is correct or necessary in my case?
No, you should not do normalizeWithinArrays! This assumes that most genes
are not changing expression between the two samples on one array, and in
your case you have every reason to expect that the 'expression' levels of
genomic DNA will not be anything like cDNA from your experimental groups,
as you mentioned in your first post.
>2. You said "For the statistical analysis, you use the R values
>directly." But after normalizeBetweenArrays(), then a MAList was
>generated. It consisted of M, A value etc but not R value (red channel
>intensity).
It's easy to convert between RGLists, which contain R and G values, and
MALists, which have M and A values. See 'RG.MA' and 'MA.RG' - they're
explained at the end of the details section of the help page for
'normalizeWithinArrays'. Another thing - Are you doing a background
correction first? Because if you don't, and do 'normalizeWithinArrays' or
'normalizeBetweenArrays' on a RGList that still has the Rb and Gb items in
it, a simple background subtraction will be done automatically. This is not
necessarily a good thing IMO because a negative R or G values in either
channel will cause the M & A values to be lost, so that you cannot recreate
the R & G values again. Let's say for simplicity sake that RG is your
original RGList before any pre-processing, and the genomic DNA is in the
Green channel on each slide. I would do something like this:
RG.nobg <- backgroundCorrect(RG, method="none")
# or maybe pick "half" to avoid neg. values
MA.nobg.Gquant <- normalizeBetweenArrays(RG.nobg,method="Gquantile")
# do a quantile normalization on the G / genomic values
RG.nobg.Gquant <- RG.MA(MA.nobg.Gquant)
# convert the MAList back to a RGList
MA.fake <- MA.nobg.Gquant
# create a MAList to manipulate
MA.fake$M <- log2(RG.nobg.Gquant$R)
# replace the M values with the log2(R) values so you can do the
analysis on them
You can now proceed with the analysis as if you had Affymetrix-type data.
You'll have to change your design matrix accordingly (no -1s!), but the
rest of your analysis should be the same as you have below. It gets a bit
more complicated if the genomic DNA is not all in the G channel - after the
background correction you have to switch the R & G values for the arrays
that have genomic DNA in the R channel, then account for the dye effect by
fitting a block effect using 'duplicateCorrelation'. It's very similar to
the Technical Replication/Randomized Block section of the limma vignette.
Good luck,
Jenny
>And then I fited my MAlist to the linear model by using:
> design<-modelMatrix(targets, ref="gDNA")
> fit<-lmFit(ma.paq,design)
>I think all my following analysis are based on the M value. Finally, I
>used eBayes function to summary statistics in order to detect the most
>differently expressed genes.
> cont.matrix<-makeContrasts( WTvsMU=wt-mu,levels=design)
> fit2<-contrasts.fit(fit,cont.matrix)
> fit2<-eBayes(fit2)
>So, I have no idea how to use R values directly. Was my codes wrong?
>I was not quite sure about my code or method, because at the end I gave
>some uninterpretable results which did not meet the expectation of the
>biologists. That is why now I am recheck my code and methods. Thank you
>again and also Wolfgang for your kindly help.
>
>Kind regards,
>Yanju
>
>
>
>Jenny Drnevich wrote:
>
>>Hi Yanju,
>>
>>I have just been working with a couple of data sets similar to yours
>>where a) one channel has the same reference and b) the assumptions of few
>>differences between sample and reference are not necessarily upheld. In
>>these cases I have been using the Rquantile or Gquantile methods of
>>normalizeBetweenArrays() in limma. These methods will do a quantile
>>normalization on the R or G channel indicated so they have the "same
>>empirical distribution across arrays, leaving the M-values (log-ratios)
>>unchanged." Say your reference is in the green channel - doing a
>>Gquantile normalization would force all the reference values to have the
>>same distribution, and then adjust the R channel values accordingly. For
>>the statistical analysis, you use the R values directly because if you
>>use the M values, it would be like you never did the normalization. If
>>the reference is not all in the same channel, I manipulate the RGList so
>>that they are all in the same channel, but then I also include 'dye' as a
>>batch effect in the model.
>>
>>HTH,
>>Jenny
>>
>>At 10:32 AM 11/20/2006, yanju wrote:
>>
>>>Dear all,
>>>
>>>I have got a microarray dataset derived from common reference design.
>>>The common reference is gemoic DNA. In normal normalization, we assume
>>>that large fraction of genes is not differently expressed, then the
>>>adjustment strategies are used to let the log-ratios have a median(mean)
>>>of 0. But in my case, every spot would have the same observed signal in
>>>the genomic channel while the signals in the cDNA channel vary greatly.
>>>Therefore, the strategies that i just mentioned are not suitable. I was
>>>wondering how to normalize this kinds of data? Is that any packages or
>>>functions existed already? Expecting your reply.
>>>
>>>Regards,
>>>Yanju
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>Search the archives:
>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>Jenny Drnevich, Ph.D.
>>
>>Functional Genomics Bioinformatics Specialist
>>W.M. Keck Center for Comparative and Functional Genomics
>>Roy J. Carver Biotechnology Center
>>University of Illinois, Urbana-Champaign
>>
>>330 ERML
>>1201 W. Gregory Dr.
>>Urbana, IL 61801
>>USA
>>
>>ph: 217-244-7355
>>fax: 217-265-5066
>>e-mail: drnevich at uiuc.edu
>
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
More information about the Bioconductor
mailing list