[BioC] problems about cDNA vs genomic arrays normalization
yanju
yanju at liacs.nl
Tue Nov 21 13:05:08 CET 2006
Dear Jenny,
Generally, I got your point. But still one not clear. you mentioned
"you'll change your design matrix accordingly (no -1s!)". May I know
the reason why? 'Cos I generated the design matrix like this:
#design<-modelMatrix(targets, ref="gDNA")
> design
wt16 wt20 wt24
sample1 -1 0 0
sample2 -1 0 0
sample3 -1 0 0
my dataset is generated by dual-channel array without dye swap. How
should I change my design matrix?
Regards,
Yanju
Jenny Drnevich wrote:
> Hi Yanju,
>
>
>> After reading your explanation, I still have 2 puzzles.
>> 1. Before I also applied normalizeWithinArrays() method to this
>> dataset. Do you think it is correct or necessary in my case?
>
>
> No, you should not do normalizeWithinArrays! This assumes that most
> genes are not changing expression between the two samples on one
> array, and in your case you have every reason to expect that the
> 'expression' levels of genomic DNA will not be anything like cDNA from
> your experimental groups, as you mentioned in your first post.
>
>
>> 2. You said "For the statistical analysis, you use the R values
>> directly." But after normalizeBetweenArrays(), then a MAList was
>> generated. It consisted of M, A value etc but not R value (red
>> channel intensity).
>
>
> It's easy to convert between RGLists, which contain R and G values,
> and MALists, which have M and A values. See 'RG.MA' and 'MA.RG' -
> they're explained at the end of the details section of the help page
> for 'normalizeWithinArrays'. Another thing - Are you doing a
> background correction first? Because if you don't, and do
> 'normalizeWithinArrays' or 'normalizeBetweenArrays' on a RGList that
> still has the Rb and Gb items in it, a simple background subtraction
> will be done automatically. This is not necessarily a good thing IMO
> because a negative R or G values in either channel will cause the M &
> A values to be lost, so that you cannot recreate the R & G values
> again. Let's say for simplicity sake that RG is your original RGList
> before any pre-processing, and the genomic DNA is in the Green channel
> on each slide. I would do something like this:
>
> RG.nobg <- backgroundCorrect(RG, method="none")
> # or maybe pick "half" to avoid neg. values
>
> MA.nobg.Gquant <- normalizeBetweenArrays(RG.nobg,method="Gquantile")
> # do a quantile normalization on the G / genomic values
>
> RG.nobg.Gquant <- RG.MA(MA.nobg.Gquant)
> # convert the MAList back to a RGList
>
> MA.fake <- MA.nobg.Gquant
> # create a MAList to manipulate
>
> MA.fake$M <- log2(RG.nobg.Gquant$R)
> # replace the M values with the log2(R) values so you can do
> the analysis on them
>
> You can now proceed with the analysis as if you had Affymetrix-type
> data. You'll have to change your design matrix accordingly (no -1s!),
> but the rest of your analysis should be the same as you have below. It
> gets a bit more complicated if the genomic DNA is not all in the G
> channel - after the background correction you have to switch the R & G
> values for the arrays that have genomic DNA in the R channel, then
> account for the dye effect by fitting a block effect using
> 'duplicateCorrelation'. It's very similar to the Technical
> Replication/Randomized Block section of the limma vignette.
>
> Good luck,
> Jenny
>
>
>
>> And then I fited my MAlist to the linear model by using:
>> design<-modelMatrix(targets, ref="gDNA")
>> fit<-lmFit(ma.paq,design)
>> I think all my following analysis are based on the M value. Finally,
>> I used eBayes function to summary statistics in order to detect the
>> most differently expressed genes.
>> cont.matrix<-makeContrasts( WTvsMU=wt-mu,levels=design)
>> fit2<-contrasts.fit(fit,cont.matrix)
>> fit2<-eBayes(fit2)
>> So, I have no idea how to use R values directly. Was my codes wrong?
>> I was not quite sure about my code or method, because at the end I
>> gave some uninterpretable results which did not meet the expectation
>> of the biologists. That is why now I am recheck my code and methods.
>> Thank you again and also Wolfgang for your kindly help.
>>
>> Kind regards,
>> Yanju
>>
>>
>>
>> Jenny Drnevich wrote:
>>
>>> Hi Yanju,
>>>
>>> I have just been working with a couple of data sets similar to yours
>>> where a) one channel has the same reference and b) the assumptions
>>> of few differences between sample and reference are not necessarily
>>> upheld. In these cases I have been using the Rquantile or Gquantile
>>> methods of normalizeBetweenArrays() in limma. These methods will do
>>> a quantile normalization on the R or G channel indicated so they
>>> have the "same empirical distribution across arrays, leaving the
>>> M-values (log-ratios) unchanged." Say your reference is in the green
>>> channel - doing a Gquantile normalization would force all the
>>> reference values to have the same distribution, and then adjust the
>>> R channel values accordingly. For the statistical analysis, you use
>>> the R values directly because if you use the M values, it would be
>>> like you never did the normalization. If the reference is not all in
>>> the same channel, I manipulate the RGList so that they are all in
>>> the same channel, but then I also include 'dye' as a batch effect in
>>> the model.
>>>
>>> HTH,
>>> Jenny
>>>
>>> At 10:32 AM 11/20/2006, yanju wrote:
>>>
>>>> Dear all,
>>>>
>>>> I have got a microarray dataset derived from common reference design.
>>>> The common reference is gemoic DNA. In normal normalization, we
>>>> assume
>>>> that large fraction of genes is not differently expressed, then the
>>>> adjustment strategies are used to let the log-ratios have a
>>>> median(mean)
>>>> of 0. But in my case, every spot would have the same observed
>>>> signal in
>>>> the genomic channel while the signals in the cDNA channel vary
>>>> greatly.
>>>> Therefore, the strategies that i just mentioned are not suitable. I
>>>> was
>>>> wondering how to normalize this kinds of data? Is that any packages or
>>>> functions existed already? Expecting your reply.
>>>>
>>>> Regards,
>>>> Yanju
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>
>>> Jenny Drnevich, Ph.D.
>>>
>>> Functional Genomics Bioinformatics Specialist
>>> W.M. Keck Center for Comparative and Functional Genomics
>>> Roy J. Carver Biotechnology Center
>>> University of Illinois, Urbana-Champaign
>>>
>>> 330 ERML
>>> 1201 W. Gregory Dr.
>>> Urbana, IL 61801
>>> USA
>>>
>>> ph: 217-244-7355
>>> fax: 217-265-5066
>>> e-mail: drnevich at uiuc.edu
>>
>>
>
> Jenny Drnevich, Ph.D.
>
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
>
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
>
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at uiuc.edu
More information about the Bioconductor
mailing list