[BioC] Normalization between arrays for common reference, time course and direct two color designs
Jenny Drnevich
drnevich at uiuc.edu
Thu Dec 7 21:52:10 CET 2006
Hi Weiyin,
Sorry - the object name in the code is arbitrary, so 'MA.norm' is a MAList
object with your data in it. Besides changing $ID to $ProbeName as you did
below, you need to change 'MA.norm' to the name of your MAList. I probably
should have specifically said something like: "if your normalized data is
in a MAList object named 'MA.norm', and your spot ID names are found in
MA.norm$genes$ID, then this code should work."
Note that this code does not average duplicate spots. Instead, it arranges
them with spacing =1 so you can use the 'duplicateCorrelation' function
before lmFit, which is better than averaging the spots. See the
Within-Array replicate spot section of the limma vignette for an example of
how to do this.
Cheers,
Jenny
At 01:33 PM 12/7/2006, Weiyin Zhou wrote:
>Hi Jenny,
>
>I have related problem with Agilent two-color array. All of the spots
>are duplicated twice (have same "ProbeName", except those positive and
>negative controls, which are duplicated multiple times. Column
>"ControlType" can identify their type. I use limma package to input
>data (ProcessedSignal, which is already background corrected and loess
>normalized), then I did between array quantile normalization.
>
>Before I do lmFit and differential expression analysis, I think I should
>remove those control spots and also average duplicated spots. So I can
>have p value for each unique ProbeName. I just tried your code, But get
>error massage.
>
> > MA.norm <- MA.norm[order(MA.norm$genes$ProbeName),]
>Error: object "MA.norm" not found
>
>
>Could you give me some advice?
>
>Thanks in advance,
>
>Weiyin Zhou
>Statistics and Data Analyst
>ExonHit Therapeutics, Inc.
>217 Perry Parkway, Building # 5
>Gaithersburg, MD 20877
>
>email: Weiyin.zhou at exonhit-usa.com
>phone: 240.404.0184
>fax: 240.683.7060
>
>
>
>-----Original Message-----
>From: bioconductor-bounces at stat.math.ethz.ch
>[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Jenny
>Drnevich
>Sent: Thursday, December 07, 2006 12:17 PM
>To: Vinoy Kumar Ramachandran
>Cc: bioconductor at stat.math.ethz.ch
>Subject: Re: [BioC] Normalization between arrays for common reference,
>time course and direct two color designs
>
>Hi Vinoy,
>
>It's better to keep the discussions on the list for other users that may
>
>have the same question. If they are not evenly spaced, after the
>normalizations you can rearrange the MA object so that they are evenly
>spaced, at least the 90% that are spotted twice. The ones that are
>spotted
>26 times are likely some sort of control spots, and you can probably
>safely
>ignore them. Why are some spotted three times? If you want to keep these
>
>genes in, a quick-and-dirty solution would be to just pick two of the
>three
>spots. The following code *should* work to rearrange the order of the
>genes, then pick out the first two spots for each unique ID.
>
>MA.norm <- MA.norm[order(MA.norm$genes$ID),]
>
>x <- unique(MA.norm$genes$ID)
>
>MA.norm$genes$spotrep <- NULL
>
># I'm sure there's a better, faster way to do the following, but this is
>
>the only way I know how:
>
>for (i in 1:length(x)) {
> y <- which( MA.norm$genes$ID == x[i] )
> MA.norm$genes$spotrep[y] <- 1:length(y)
> }
>
>MA.norm.2spot <- MA.norm[MA.norm$genes$spotrep <= 2 , ]
># now your spacing=1 and ndups=2
>
>HTH,
>Jenny
>
>
>
>
>At 10:36 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
> >Hi Jenny,
> >
> >Thanks a lot for the valuable information. I will try to do loess first
>
> >and tehn doa scale if necessary. With regarding the correlation in the
> >LmFit, my the spots in the array are not evenly spaced and not evenly
> >replicated, 90% spots are spotted twice, 8% are thrice and 2% spots are
>
> >spotted 26 times.I found this code in a posting in the Limma user forum
>
> >and try to adapt the code to my data. Is there any other elegant way to
>
> >deal with this kind of replication ?
> >
> >once again thanks for the information
> >
> >with regards,
> >vinoy
> >On 12/7/06, Jenny Drnevich
><<mailto:drnevich at uiuc.edu>drnevich at uiuc.edu>
> >wrote:
> >Hi Vinoy,
> >
> >Using the 'Gquantile' between-array normalization is not appropriate in
> >your case because your reference is not always in the Green channel.
>The
> >values you are using for Exp3 and Exp6 in the linear model are actually
> >from the reference, so it's no wonder your gene lists don't make sense.
>To
> >clarify, the discussion we were having recently on the mailing list
>about
> >using Gquantile is when your experimental samples are expected to be
>VERY
> >different from the reference, such that the assumption of a
>within-array
> >normalization may not be met. In your case (and in most reference
>designs)
> >you probably meet the assumptions of most genes not changing, and so
>should
> >first do a within-array loess-type normalization to help remove dye
>bias.
> >Then check to see if the resulting distributions of M values are
>similar
> >between arrays. If they are very different, and you would expect them
>not
> >to be very different, do a between-array normalization on the M values
>-
> >the scale method of 'normalizeBetweenArrays' is my favorite. The design
> >matrix you have below will correctly adjust for dye swaps, assuming
>that
> >the 'dye swaps' are all biological replicates and not technical
>replicates.
> >
> >I'm a little confused about the way you're calling the 'lmFit'
>function.
> >Your arrays appear to have duplicate spots, but you have the
>correlation as
> >zero. Something is very wrong with your arrays if there is zero
>correlation
> >between the duplicate spots! I suggested you read the limma vignette
>very
> >closely, especially the sections on common reference designs and
> >within-array replicate spots.
> >
> >Good luck,
> >Jenny
> >
> >At 12:58 AM 12/7/2006, Vinoy Kumar Ramachandran wrote:
> > > Dear Limma users,
> > >
> > >I am working on custom spotted 70mer oligo arrays, and use Bluefuse
>to
> > >analyse the images. With the help of the excellent user guide and
> > >Bioconductor user forum(GMANE), i have analysed my direct comparison
> > >experiements. I also have common reference, time course and direct
>two color
> > >design type experiments to analyse. I have read the recent posting in
>the
> > >list about using Rquantile or Gquantile for normalizing between
>arrays in
> > >common reference experiments. I tried to do a common references
>analysis
> > >using the discussed code.But the resulting gene list is different
>from the
> > >expected list.i am also wondering how to account for dye swaps. I
>have
> > >pasted the code which i used for common reference.
> > >
> > >It will also be very useful if you any one could tell me how to use
> > >normalization between arrays for direct two color designs.
> > >
> > >My experiment design is
> > > Cy3 Cy5
> > >____________________
> > >Exp1 Ref CpdA
> > >Exp2 Ref CpdA
> > >Exp3 CpdA Ref
> > >
> > >Exp4 Ref CpdB
> > >Exp5 Ref CpdB
> > >Exp6 CpdB Ref
> > >
> > >Code which i used for analysing common referencec:
> >
> >-----------------------------------------------------------------------
>--
> > ------------------------------------------------
> > >library(limma)
> > >targets <- readTargets("commonref.txt", row.names= "Name")
> > >RG <- read.maimages(targets$FileName, source="bluefuse")
> > >RG$genes <- readGAL()
> > >RG$printer <- getLayout(RG$genes)
> > >spottypes <- readSpotTypes()
> > >RG$genes$Status <- controlStatus(spottypes, RG)
> > >isGene <- RG$genes$Status == "oligos"
> > >MA.Gquantile <- normalizeBetweenArrays(RG[isGene,],
>method="Gquantile")
> > >RG.Gquantile <- RG.MA(MA.Gquantile)
> > >MA.dummy <- MA.Gquantile
> > >MA.dummy$M <- log2(RG.Gquantile$R)
> > >o <- order(MA.dummy$genes$ID)
> > >MA.sorted <- MA.dummy[o,]
> > >design <- modelMatrix(targets, ref="Ref")
> > >fit <- lmFit(MA.sorted, design, ndups=2, spacing=1, correlation=0)
> > >fit.eb <- eBayes(fit)
> > >write.fit(fit.eb, file="data/commonref.xls", adjust="BH")
> >
> >-----------------------------------------------------------------------
>--
> > --------------------------------------------------------
> > >
> > >thanks in advacne
> > >
> > >with regards,
> > >Vinoy......
> > >
> > > [[alternative HTML version deleted]]
> > >
> > >_______________________________________________
> > >Bioconductor mailing list
> > ><mailto:Bioconductor at stat.math.ethz.ch>Bioconductor at stat.math.ethz.ch
> > >https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >Search the archives:
> >
> ><http://news.gmane.org/gmane.science.biology.informatics.conductor>http
>:/
> > /news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >Jenny Drnevich, Ph.D.
> >
> >Functional Genomics Bioinformatics Specialist
> >W.M. Keck Center for Comparative and Functional Genomics
> >Roy J. Carver Biotechnology Center
> >University of Illinois, Urbana-Champaign
> >
> >330 ERML
> >1201 W. Gregory Dr.
> >Urbana, IL 61801
> >USA
> >
> >ph: 217-244-7355
> >fax: 217-265-5066
> >e-mail: <mailto:drnevich at uiuc.edu>drnevich at uiuc.edu
> >
> >
> >
> >
> >--
> >Vinoy......
>
>Jenny Drnevich, Ph.D.
>
>Functional Genomics Bioinformatics Specialist
>W.M. Keck Center for Comparative and Functional Genomics
>Roy J. Carver Biotechnology Center
>University of Illinois, Urbana-Champaign
>
>330 ERML
>1201 W. Gregory Dr.
>Urbana, IL 61801
>USA
>
>ph: 217-244-7355
>fax: 217-265-5066
>e-mail: drnevich at uiuc.edu
> [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
More information about the Bioconductor
mailing list