[BioC] Normalization between arrays for common reference, time course and direct two color designs

Jenny Drnevich drnevich at uiuc.edu
Fri Dec 8 19:35:44 CET 2006


Hi Yanju,

You forgot to put 'ndup=2' in the call to 'lmFit'.  In 
'duplicateCorrelation' the default is ndup=2, but in 'lmFit' the default is 
ndup=1. In the newest version (29 August 2006) of the limma vignette, the 
example for Within-Array Replicate Spots is harder to find - it's in 
Section 11, Case Studies, 11.6 Bob Mutant Data: Within-Array Replicate 
Spots. In my old, old limma vignette that I printed out, it was it's own 
section.

Jenny


At 07:50 AM 12/8/2006, yanju wrote:
>Hello Jenny and all,
>
>I tried to used "duplicateCorrelation" , then fit MAlist to the linear 
>model by using the correlation. Theoritically, FIT will contain half as 
>many row as MA (2 replicates within one array). But in my case, the gene 
>number in FIT, doest not change. I still got replicates within the array. why?
>
>MA.norep<-duplicateCorrelation(MA,design) #then, MA.norep$cor=0.67
>FIT<-lmFit(MA, design, cor=MA.norep$cor)
>
>MA is a MAList object. As it is said in the userguide, if object is an 
>MAList then the arguments will be extracted from it which means we dont 
>need to specified. Any explanations?
>
>Cheers,
>Yanju Zhang
>
>Jenny Drnevich wrote:
>
>>Hi Weiyin,
>>
>>Sorry - the object name in the code is arbitrary, so 'MA.norm' is a 
>>MAList object with your data in it. Besides changing $ID to $ProbeName as 
>>you did below, you need to change 'MA.norm' to the name of your MAList. I 
>>probably should have specifically said something like: "if your 
>>normalized data is in a MAList object named 'MA.norm', and your spot ID 
>>names are found in MA.norm$genes$ID, then this code should work."
>>
>>Note that this code does not average duplicate spots. Instead, it 
>>arranges them with spacing =1 so you can use the 'duplicateCorrelation' 
>>function before lmFit, which is better than averaging the spots. See the 
>>Within-Array replicate spot section of the limma vignette for an example 
>>of how to do this.
>>
>>Cheers,
>>Jenny
>>
>>
>>
>>
>>At 01:33 PM 12/7/2006, Weiyin Zhou wrote:
>>
>>
>>>Hi Jenny,
>>>
>>>I have related problem with Agilent two-color array.  All of the spots
>>>are duplicated twice (have same "ProbeName", except those positive and
>>>negative controls, which are duplicated multiple times.  Column
>>>"ControlType" can identify their type.  I use limma package to input
>>>data (ProcessedSignal, which is already background corrected and loess
>>>normalized), then I did between array quantile normalization.
>>>
>>>Before I do lmFit and differential expression analysis, I think I should
>>>remove those control spots and also average duplicated spots.  So I can
>>>have p value for each unique ProbeName.  I just tried your code, But get
>>>error massage.
>>>
>>>
>>>
>>>>MA.norm <- MA.norm[order(MA.norm$genes$ProbeName),]
>>>>
>>>Error: object "MA.norm" not found
>>>
>>>
>>>Could you give me some advice?
>>>
>>>Thanks in advance,
>>>
>>>Weiyin Zhou
>>>Statistics and Data Analyst
>>>ExonHit Therapeutics, Inc.
>>>217 Perry Parkway, Building # 5
>>>Gaithersburg, MD 20877
>>>
>>>email: Weiyin.zhou at exonhit-usa.com
>>>phone: 240.404.0184
>>>fax: 240.683.7060
>>>
>>>
>>>
>>>-----Original Message-----
>>>From: bioconductor-bounces at stat.math.ethz.ch
>>>[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Jenny
>>>Drnevich
>>>Sent: Thursday, December 07, 2006 12:17 PM
>>>To: Vinoy Kumar Ramachandran
>>>Cc: bioconductor at stat.math.ethz.ch
>>>Subject: Re: [BioC] Normalization between arrays for common reference,
>>>time course and direct two color designs
>>>
>>>Hi Vinoy,
>>>
>>>It's better to keep the discussions on the list for other users that may
>>>
>>>have the same question. If they are not evenly spaced, after the
>>>normalizations you can rearrange the MA object so that they are evenly
>>>spaced, at least the 90% that are spotted twice. The ones that are
>>>spotted
>>>26 times are likely some sort of control spots, and you can probably
>>>safely
>>>ignore them. Why are some spotted three times? If you want to keep these
>>>
>>>genes in, a quick-and-dirty solution would be to just pick two of the
>>>three
>>>spots. The following code *should* work to rearrange the order of the
>>>genes, then pick out the first two spots for each unique ID.
>>>
>>>MA.norm <- MA.norm[order(MA.norm$genes$ID),]
>>>
>>>x <- unique(MA.norm$genes$ID)
>>>
>>>MA.norm$genes$spotrep <- NULL
>>>
>>># I'm sure there's a better, faster way to do the following, but this is
>>>
>>>the only way I know how:
>>>
>>>for (i in 1:length(x)) {
>>>     y <- which( MA.norm$genes$ID == x[i] )
>>>     MA.norm$genes$spotrep[y] <- 1:length(y)
>>>     }
>>>
>>>MA.norm.2spot <- MA.norm[MA.norm$genes$spotrep <= 2 , ]
>>># now your spacing=1 and ndups=2
>>>
>>>HTH,
>>>Jenny
>>>
>>>
>>>
>>>
>>>At 10:36 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
>>>
>>>
>>>>Hi Jenny,
>>>>
>>>>Thanks a lot for the valuable information. I will try to do loess first
>>>>
>>>>
>>>>and tehn doa scale if necessary. With regarding the correlation in the
>>>>LmFit, my the spots in the array are not evenly spaced and not evenly
>>>>replicated, 90% spots are spotted twice, 8% are thrice and 2% spots are
>>>>
>>>>
>>>>spotted 26 times.I found this code in a posting in the Limma user forum
>>>>
>>>>
>>>>and try to adapt the code to my data. Is there any other elegant way to
>>>>
>>>>
>>>>deal with this kind of replication ?
>>>>
>>>>once again thanks for the information
>>>>
>>>>with regards,
>>>>vinoy
>>>>On 12/7/06, Jenny Drnevich
>>>>
>>><<mailto:drnevich at uiuc.edu>drnevich at uiuc.edu>
>>>
>>>
>>>>wrote:
>>>>Hi Vinoy,
>>>>
>>>>Using the 'Gquantile' between-array normalization is not appropriate in
>>>>your case because your reference is not always in the Green channel.
>>>>
>>>The
>>>
>>>
>>>>values you are using for Exp3 and Exp6 in the linear model are actually
>>>>
>>>>
>>>>from the reference, so it's no wonder your gene lists don't make sense.
>>>To
>>>
>>>
>>>>clarify, the discussion we were having recently on the mailing list
>>>>
>>>about
>>>
>>>
>>>>using Gquantile is when your experimental samples are expected to be
>>>>
>>>VERY
>>>
>>>
>>>>different from the reference, such that the assumption of a
>>>>
>>>within-array
>>>
>>>
>>>>normalization may not be met. In your case (and in most reference
>>>>
>>>designs)
>>>
>>>
>>>>you probably meet the assumptions of most genes not changing, and so
>>>>
>>>should
>>>
>>>
>>>>first do a within-array loess-type normalization to help remove dye
>>>>
>>>bias.
>>>
>>>
>>>>Then check to see if the resulting distributions of M values are
>>>>
>>>similar
>>>
>>>
>>>>between arrays. If they are very different, and you would expect them
>>>>
>>>not
>>>
>>>
>>>>to be very different, do a between-array normalization on the M values
>>>>
>>>-
>>>
>>>
>>>>the scale method of 'normalizeBetweenArrays' is my favorite. The design
>>>>matrix you have below will correctly adjust for dye swaps, assuming
>>>>
>>>that
>>>
>>>
>>>>the 'dye swaps' are all biological replicates and not technical
>>>>
>>>replicates.
>>>
>>>
>>>>I'm a little confused about the way you're calling the 'lmFit'
>>>>
>>>function.
>>>
>>>
>>>>Your arrays appear to have duplicate spots, but you have the
>>>>
>>>correlation as
>>>
>>>
>>>>zero. Something is very wrong with your arrays if there is zero
>>>>
>>>correlation
>>>
>>>
>>>>between the duplicate spots! I suggested you read the limma vignette
>>>>
>>>very
>>>
>>>
>>>>closely, especially the sections on common reference designs and
>>>>within-array replicate spots.
>>>>
>>>>Good luck,
>>>>Jenny
>>>>
>>>>At 12:58 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
>>>>
>>>>
>>>>>Dear Limma users,
>>>>>
>>>>>I am working on custom spotted 70mer oligo arrays, and use Bluefuse
>>>>>
>>>to
>>>
>>>
>>>>>analyse the images. With the help of the excellent user guide and
>>>>>Bioconductor user forum(GMANE), i have analysed my direct comparison
>>>>>experiements. I also have common reference, time course and direct
>>>>>
>>>two color
>>>
>>>
>>>>>design type experiments to analyse. I have read the recent posting in
>>>>>
>>>the
>>>
>>>
>>>>>list  about using Rquantile or Gquantile for normalizing between
>>>>>
>>>arrays in
>>>
>>>
>>>>>common reference experiments. I tried to do a common references
>>>>>
>>>analysis
>>>
>>>
>>>>>using the discussed code.But the resulting gene list is different
>>>>>
>>>from the
>>
>>
>>>>>expected list.i am also wondering how to account for dye swaps. I
>>>>>
>>>have
>>>
>>>
>>>>>pasted the code which i used for common reference.
>>>>>
>>>>>It will also be very useful if you any one could tell me how to use
>>>>>normalization between arrays for direct two color designs.
>>>>>
>>>>>My experiment design is
>>>>>          Cy3   Cy5
>>>>>____________________
>>>>>Exp1  Ref    CpdA
>>>>>Exp2  Ref    CpdA
>>>>>Exp3  CpdA Ref
>>>>>
>>>>>Exp4  Ref   CpdB
>>>>>Exp5  Ref   CpdB
>>>>>Exp6 CpdB Ref
>>>>>
>>>>>Code which i used for analysing common referencec:
>>>>>
>>>>-----------------------------------------------------------------------
>>>>
>>>--
>>>
>>>
>>>>------------------------------------------------
>>>>
>>>>
>>>>>library(limma)
>>>>>targets <- readTargets("commonref.txt", row.names= "Name")
>>>>>RG <- read.maimages(targets$FileName, source="bluefuse")
>>>>>RG$genes <- readGAL()
>>>>>RG$printer <- getLayout(RG$genes)
>>>>>spottypes <- readSpotTypes()
>>>>>RG$genes$Status <- controlStatus(spottypes, RG)
>>>>>isGene <- RG$genes$Status == "oligos"
>>>>>MA.Gquantile <- normalizeBetweenArrays(RG[isGene,],
>>>>>
>>>method="Gquantile")
>>>
>>>
>>>>>RG.Gquantile <- RG.MA(MA.Gquantile)
>>>>>MA.dummy <- MA.Gquantile
>>>>>MA.dummy$M <- log2(RG.Gquantile$R)
>>>>>o <- order(MA.dummy$genes$ID)
>>>>>MA.sorted <- MA.dummy[o,]
>>>>>design <- modelMatrix(targets, ref="Ref")
>>>>>fit <- lmFit(MA.sorted, design, ndups=2, spacing=1, correlation=0)
>>>>>fit.eb <- eBayes(fit)
>>>>>write.fit(fit.eb, file="data/commonref.xls", adjust="BH")
>>>>>
>>>>-----------------------------------------------------------------------
>>>>
>>>--
>>>
>>>
>>>>--------------------------------------------------------
>>>>
>>>>
>>>>>thanks in advacne
>>>>>
>>>>>with regards,
>>>>>Vinoy......
>>>>>
>>>>>        [[alternative HTML version deleted]]
>>>>>
>>>>>_______________________________________________
>>>>>Bioconductor mailing list
>>>>><mailto:Bioconductor at stat.math.ethz.ch>Bioconductor at stat.math.ethz.ch
>>>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>Search the archives:
>>>>>
>>>><http://news.gmane.org/gmane.science.biology.informatics.conductor>http
>>>>
>>>:/
>>>
>>>
>>>>/news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>Jenny Drnevich, Ph.D.
>>>>
>>>>Functional Genomics Bioinformatics Specialist
>>>>W.M. Keck Center for Comparative and Functional Genomics
>>>>Roy J. Carver Biotechnology Center
>>>>University of Illinois, Urbana-Champaign
>>>>
>>>>330 ERML
>>>>1201 W. Gregory Dr.
>>>>Urbana, IL 61801
>>>>USA
>>>>
>>>>ph: 217-244-7355
>>>>fax: 217-265-5066
>>>>e-mail: <mailto:drnevich at uiuc.edu>drnevich at uiuc.edu
>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Vinoy......
>>>>
>>>Jenny Drnevich, Ph.D.
>>>
>>>Functional Genomics Bioinformatics Specialist
>>>W.M. Keck Center for Comparative and Functional Genomics
>>>Roy J. Carver Biotechnology Center
>>>University of Illinois, Urbana-Champaign
>>>
>>>330 ERML
>>>1201 W. Gregory Dr.
>>>Urbana, IL 61801
>>>USA
>>>
>>>ph: 217-244-7355
>>>fax: 217-265-5066
>>>e-mail: drnevich at uiuc.edu
>>>        [[alternative HTML version deleted]]
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>Search the archives:
>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>Jenny Drnevich, Ph.D.
>>
>>Functional Genomics Bioinformatics Specialist
>>W.M. Keck Center for Comparative and Functional Genomics
>>Roy J. Carver Biotechnology Center
>>University of Illinois, Urbana-Champaign
>>
>>330 ERML
>>1201 W. Gregory Dr.
>>Urbana, IL 61801
>>USA
>>
>>ph: 217-244-7355
>>fax: 217-265-5066
>>e-mail: drnevich at uiuc.edu
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives: 
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu



More information about the Bioconductor mailing list