[BioC] Normalization between arrays for common reference, time course and direct two color designs

Fri Dec 8 14:50:04 CET 2006

Hello Jenny and all,

I tried to used "duplicateCorrelation" , then fit MAlist to the linear 
model by using the correlation. Theoritically, FIT will contain half as 
many row as MA (2 replicates within one array). But in my case, the gene 
number in FIT, doest not change. I still got replicates within the 
array. why?

MA.norep<-duplicateCorrelation(MA,design) #then, MA.norep$cor=0.67
FIT<-lmFit(MA, design, cor=MA.norep$cor)

MA is a MAList object. As it is said in the userguide, if object is an 
MAList then the arguments will be extracted from it which means we dont 
need to specified. Any explanations?

Cheers,
Yanju Zhang

Jenny Drnevich wrote:

>Hi Weiyin,
>
>Sorry - the object name in the code is arbitrary, so 'MA.norm' is a MAList 
>object with your data in it. Besides changing $ID to $ProbeName as you did 
>below, you need to change 'MA.norm' to the name of your MAList. I probably 
>should have specifically said something like: "if your normalized data is 
>in a MAList object named 'MA.norm', and your spot ID names are found in 
>MA.norm$genes$ID, then this code should work."
>
>Note that this code does not average duplicate spots. Instead, it arranges 
>them with spacing =1 so you can use the 'duplicateCorrelation' function 
>before lmFit, which is better than averaging the spots. See the 
>Within-Array replicate spot section of the limma vignette for an example of 
>how to do this.
>
>Cheers,
>Jenny
>
>
>
>
>At 01:33 PM 12/7/2006, Weiyin Zhou wrote:
>  
>
>>Hi Jenny,
>>
>>I have related problem with Agilent two-color array.  All of the spots
>>are duplicated twice (have same "ProbeName", except those positive and
>>negative controls, which are duplicated multiple times.  Column
>>"ControlType" can identify their type.  I use limma package to input
>>data (ProcessedSignal, which is already background corrected and loess
>>normalized), then I did between array quantile normalization.
>>
>>Before I do lmFit and differential expression analysis, I think I should
>>remove those control spots and also average duplicated spots.  So I can
>>have p value for each unique ProbeName.  I just tried your code, But get
>>error massage.
>>
>>    
>>
>>>MA.norm <- MA.norm[order(MA.norm$genes$ProbeName),]
>>>      
>>>
>>Error: object "MA.norm" not found
>>
>>
>>Could you give me some advice?
>>
>>Thanks in advance,
>>
>>Weiyin Zhou
>>Statistics and Data Analyst
>>ExonHit Therapeutics, Inc.
>>217 Perry Parkway, Building # 5
>>Gaithersburg, MD 20877
>>
>>email: Weiyin.zhou at exonhit-usa.com
>>phone: 240.404.0184
>>fax: 240.683.7060
>>
>>
>>
>>-----Original Message-----
>>From: bioconductor-bounces at stat.math.ethz.ch
>>[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Jenny
>>Drnevich
>>Sent: Thursday, December 07, 2006 12:17 PM
>>To: Vinoy Kumar Ramachandran
>>Cc: bioconductor at stat.math.ethz.ch
>>Subject: Re: [BioC] Normalization between arrays for common reference,
>>time course and direct two color designs
>>
>>Hi Vinoy,
>>
>>It's better to keep the discussions on the list for other users that may
>>
>>have the same question. If they are not evenly spaced, after the
>>normalizations you can rearrange the MA object so that they are evenly
>>spaced, at least the 90% that are spotted twice. The ones that are
>>spotted
>>26 times are likely some sort of control spots, and you can probably
>>safely
>>ignore them. Why are some spotted three times? If you want to keep these
>>
>>genes in, a quick-and-dirty solution would be to just pick two of the
>>three
>>spots. The following code *should* work to rearrange the order of the
>>genes, then pick out the first two spots for each unique ID.
>>
>>MA.norm <- MA.norm[order(MA.norm$genes$ID),]
>>
>>x <- unique(MA.norm$genes$ID)
>>
>>MA.norm$genes$spotrep <- NULL
>>
>># I'm sure there's a better, faster way to do the following, but this is
>>
>>the only way I know how:
>>
>>for (i in 1:length(x)) {
>>     y <- which( MA.norm$genes$ID == x[i] )
>>     MA.norm$genes$spotrep[y] <- 1:length(y)
>>     }
>>
>>MA.norm.2spot <- MA.norm[MA.norm$genes$spotrep <= 2 , ]
>># now your spacing=1 and ndups=2
>>
>>HTH,
>>Jenny
>>
>>
>>
>>
>>At 10:36 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
>>    
>>
>>>Hi Jenny,
>>>
>>>Thanks a lot for the valuable information. I will try to do loess first
>>>      
>>>
>>>and tehn doa scale if necessary. With regarding the correlation in the
>>>LmFit, my the spots in the array are not evenly spaced and not evenly
>>>replicated, 90% spots are spotted twice, 8% are thrice and 2% spots are
>>>      
>>>
>>>spotted 26 times.I found this code in a posting in the Limma user forum
>>>      
>>>
>>>and try to adapt the code to my data. Is there any other elegant way to
>>>      
>>>
>>>deal with this kind of replication ?
>>>
>>>once again thanks for the information
>>>
>>>with regards,
>>>vinoy
>>>On 12/7/06, Jenny Drnevich
>>>      
>>>
>><<mailto:drnevich at uiuc.edu>drnevich at uiuc.edu>
>>    
>>
>>>wrote:
>>>Hi Vinoy,
>>>
>>>Using the 'Gquantile' between-array normalization is not appropriate in
>>>your case because your reference is not always in the Green channel.
>>>      
>>>
>>The
>>    
>>
>>>values you are using for Exp3 and Exp6 in the linear model are actually
>>>      
>>>
>>>from the reference, so it's no wonder your gene lists don't make sense.
>>To
>>    
>>
>>>clarify, the discussion we were having recently on the mailing list
>>>      
>>>
>>about
>>    
>>
>>>using Gquantile is when your experimental samples are expected to be
>>>      
>>>
>>VERY
>>    
>>
>>>different from the reference, such that the assumption of a
>>>      
>>>
>>within-array
>>    
>>
>>>normalization may not be met. In your case (and in most reference
>>>      
>>>
>>designs)
>>    
>>
>>>you probably meet the assumptions of most genes not changing, and so
>>>      
>>>
>>should
>>    
>>
>>>first do a within-array loess-type normalization to help remove dye
>>>      
>>>
>>bias.
>>    
>>
>>>Then check to see if the resulting distributions of M values are
>>>      
>>>
>>similar
>>    
>>
>>>between arrays. If they are very different, and you would expect them
>>>      
>>>
>>not
>>    
>>
>>>to be very different, do a between-array normalization on the M values
>>>      
>>>
>>-
>>    
>>
>>>the scale method of 'normalizeBetweenArrays' is my favorite. The design
>>>matrix you have below will correctly adjust for dye swaps, assuming
>>>      
>>>
>>that
>>    
>>
>>>the 'dye swaps' are all biological replicates and not technical
>>>      
>>>
>>replicates.
>>    
>>
>>>I'm a little confused about the way you're calling the 'lmFit'
>>>      
>>>
>>function.
>>    
>>
>>>Your arrays appear to have duplicate spots, but you have the
>>>      
>>>
>>correlation as
>>    
>>
>>>zero. Something is very wrong with your arrays if there is zero
>>>      
>>>
>>correlation
>>    
>>
>>>between the duplicate spots! I suggested you read the limma vignette
>>>      
>>>
>>very
>>    
>>
>>>closely, especially the sections on common reference designs and
>>>within-array replicate spots.
>>>
>>>Good luck,
>>>Jenny
>>>
>>>At 12:58 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
>>>      
>>>
>>>> Dear Limma users,
>>>>
>>>>I am working on custom spotted 70mer oligo arrays, and use Bluefuse
>>>>        
>>>>
>>to
>>    
>>
>>>>analyse the images. With the help of the excellent user guide and
>>>>Bioconductor user forum(GMANE), i have analysed my direct comparison
>>>>experiements. I also have common reference, time course and direct
>>>>        
>>>>
>>two color
>>    
>>
>>>>design type experiments to analyse. I have read the recent posting in
>>>>        
>>>>
>>the
>>    
>>
>>>>list  about using Rquantile or Gquantile for normalizing between
>>>>        
>>>>
>>arrays in
>>    
>>
>>>>common reference experiments. I tried to do a common references
>>>>        
>>>>
>>analysis
>>    
>>
>>>>using the discussed code.But the resulting gene list is different
>>>>        
>>>>
>>from the
>  
>
>>>>expected list.i am also wondering how to account for dye swaps. I
>>>>        
>>>>
>>have
>>    
>>
>>>>pasted the code which i used for common reference.
>>>>
>>>>It will also be very useful if you any one could tell me how to use
>>>>normalization between arrays for direct two color designs.
>>>>
>>>>My experiment design is
>>>>          Cy3   Cy5
>>>>____________________
>>>>Exp1  Ref    CpdA
>>>>Exp2  Ref    CpdA
>>>>Exp3  CpdA Ref
>>>>
>>>>Exp4  Ref   CpdB
>>>>Exp5  Ref   CpdB
>>>>Exp6 CpdB Ref
>>>>
>>>>Code which i used for analysing common referencec:
>>>>        
>>>>
>>>-----------------------------------------------------------------------
>>>      
>>>
>>--
>>    
>>
>>>------------------------------------------------
>>>      
>>>
>>>>library(limma)
>>>>targets <- readTargets("commonref.txt", row.names= "Name")
>>>>RG <- read.maimages(targets$FileName, source="bluefuse")
>>>>RG$genes <- readGAL()
>>>>RG$printer <- getLayout(RG$genes)
>>>>spottypes <- readSpotTypes()
>>>>RG$genes$Status <- controlStatus(spottypes, RG)
>>>>isGene <- RG$genes$Status == "oligos"
>>>>MA.Gquantile <- normalizeBetweenArrays(RG[isGene,],
>>>>        
>>>>
>>method="Gquantile")
>>    
>>
>>>>RG.Gquantile <- RG.MA(MA.Gquantile)
>>>>MA.dummy <- MA.Gquantile
>>>>MA.dummy$M <- log2(RG.Gquantile$R)
>>>>o <- order(MA.dummy$genes$ID)
>>>>MA.sorted <- MA.dummy[o,]
>>>>design <- modelMatrix(targets, ref="Ref")
>>>>fit <- lmFit(MA.sorted, design, ndups=2, spacing=1, correlation=0)
>>>>fit.eb <- eBayes(fit)
>>>>write.fit(fit.eb, file="data/commonref.xls", adjust="BH")
>>>>        
>>>>
>>>-----------------------------------------------------------------------
>>>      
>>>
>>--
>>    
>>
>>>--------------------------------------------------------
>>>      
>>>
>>>>thanks in advacne
>>>>
>>>>with regards,
>>>>Vinoy......
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>>_______________________________________________
>>>>Bioconductor mailing list
>>>><mailto:Bioconductor at stat.math.ethz.ch>Bioconductor at stat.math.ethz.ch
>>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>Search the archives:
>>>>        
>>>>
>>><http://news.gmane.org/gmane.science.biology.informatics.conductor>http
>>>      
>>>
>>:/
>>    
>>
>>>/news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>Jenny Drnevich, Ph.D.
>>>
>>>Functional Genomics Bioinformatics Specialist
>>>W.M. Keck Center for Comparative and Functional Genomics
>>>Roy J. Carver Biotechnology Center
>>>University of Illinois, Urbana-Champaign
>>>
>>>330 ERML
>>>1201 W. Gregory Dr.
>>>Urbana, IL 61801
>>>USA
>>>
>>>ph: 217-244-7355
>>>fax: 217-265-5066
>>>e-mail: <mailto:drnevich at uiuc.edu>drnevich at uiuc.edu
>>>
>>>
>>>
>>>
>>>--
>>>Vinoy......
>>>      
>>>
>>Jenny Drnevich, Ph.D.
>>
>>Functional Genomics Bioinformatics Specialist
>>W.M. Keck Center for Comparative and Functional Genomics
>>Roy J. Carver Biotechnology Center
>>University of Illinois, Urbana-Champaign
>>
>>330 ERML
>>1201 W. Gregory Dr.
>>Urbana, IL 61801
>>USA
>>
>>ph: 217-244-7355
>>fax: 217-265-5066
>>e-mail: drnevich at uiuc.edu
>>        [[alternative HTML version deleted]]
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>    
>>
>
>Jenny Drnevich, Ph.D.
>
>Functional Genomics Bioinformatics Specialist
>W.M. Keck Center for Comparative and Functional Genomics
>Roy J. Carver Biotechnology Center
>University of Illinois, Urbana-Champaign
>
>330 ERML
>1201 W. Gregory Dr.
>Urbana, IL 61801
>USA
>
>ph: 217-244-7355
>fax: 217-265-5066
>e-mail: drnevich at uiuc.edu
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>  
>