[BioC] gDNA Cy3, cDNA Cy5

Thu Sep 27 11:28:35 CEST 2007

Thanks Wolfgang, will explore some of your suggestions.

a

> -----Original Message-----
> From: Wolfgang Huber [mailto:huber at ebi.ac.uk] 
> Sent: 26 September 2007 21:47
> To: Al Ivens
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] gDNA Cy3, cDNA Cy5
> 
> 
> Dear Al,
> 
> what you have done looks very reasonable, but of course your outcome 
> doesn't seem to be so. Issuse that one might want to 
> investigate, if you 
> haven't already, are:
> - array quality
> - variance of the data at the low end, and the background 
> correction (if 
> you have many very small or even negative raw intensities)
> - do a single color analysis as if there were no Cy3
> - compute log(Cy5/Cy3) within each array, then normalize the 
> log-ratios 
> between arrays, and here I would go successively from least 
> aggressive 
> to most aggressive (scaling, affine linear, a stiff 
> loess-like smoother, 
> quantiles) and only go to most aggressive if the data really 
> ask for it.
> 
>   Best wishes
>    Wolfgang
> 
> ------------------------------------------------------------------
> Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
> 
> > Hi,
> > 
> > I have a whole pile of 2-colour arrays done with Cy3 
> genomic DNA, and 
> > Cy5 cDNA.  There are 7 "test" samples, each done in 
> triplicate against 
> > the genomic DNA (so 21 slides).  I have seen a previous 
> posting from 
> > Jenny about how to analyse these kinds of hybes, and followed her 
> > suggestions.
> > 
> > The scans are unprocessed BlueFuse, but bg has already been 
> > subtracted.
> > 
> >> red_green <-
> > 
> read.maimages(datafiles,source="bluefuse",columns=list(G="AMPCH2",R="A
> > MP
> > CH1",
> > +                            
> weights="CONFIDENCE",quality="QUALITY"))
> > 
> > Although boxplots of the various slides for the two 
> channels indicate 
> > a pretty reasonable degree of similarity within a channel, the two 
> > channels as a whole were quite difft:
> > 
> >> summary(as.vector(log2(red_green$R)))
> >    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
> >   1.157   7.381   8.700   9.016  10.600  16.420 
> >> summary(as.vector(log2(red_green$G)))
> >    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
> >   1.820   7.666  10.380  10.020  12.470  16.370
> > 
> > Despite this, I proceeded anyway:
> > 
> >> ## as Cy3 is gDNA in the ref channel, don't do NWA
> >> NBA <- normalizeBetweenArrays(red_green,method="Gquantile")
> >>
> >> ## get back the R and G values after Gq normalisation NBA.RG.MA <- 
> >> RG.MA(NBA)
> > 
> > All Cy3 distributions were now identical, as expected, and the Cy5 
> > channel had also been modified:
> > 
> >> summary(as.vector(log2(NBA.RG.MA$G)))
> >    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
> >   3.038   7.580  10.400  10.020  12.440  15.600 
> >> summary(as.vector(log2(NBA.RG.MA$R)))
> >    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
> >  0.8795  7.3940  8.7480  9.0160 10.5700 17.4500
> > 
> >> ## create a MAList to manipulate
> >> NBA.fake <- NBA
> >>
> >> ## replace the M values with the log2(R) values
> >> NBA.fake$M <- log2(NBA.RG.MA$R)
> > 
> > I then did fitting, using a design matrix of the ref="gDNA" 
> kind, with 
> > a contrast matrix for the cDNAx vs cDNAy comparisons
> > 
> >> NBA.fakefit <- lmFit(NBA.fake$M, design=designMATRIX, method="ls") 
> >> NBA.fakecontrastsFit <- contrasts.fit(NBA.fakefit,contrasts)
> >> eBNBA.fakecontrastsFit <- eBayes(NBA.fakecontrastsFit)
> > 
> > I then looked at the coefficients of the eBayesed contrast 
> fit object:
> > 
> >> for(i in 1:length(colnames(contrasts)))
> > + {
> > + cat("Contrast",i," 
> ",summary(eBNBA.fakecontrastsFit$coeff[,i]),"\n")
> > + }
> > 
> > ## individual cDNAs (cDNA1, 2 etc)
> >              min   25%   median mean 75% max
> > Contrast 1   5.417 8.538 9.79 9.873 10.99 16.23 
> > Contrast 2   4.426 6.867 8.623 8.798 10.46 15.58 
> > Contrast 3   4.623 7.26 8.606 8.951 10.37 15.92 
> > Contrast 4   5.175 7.626 8.853 9.323 10.72 16.52 
> > Contrast 5   3.928 7.175 8.864 9.073 10.71 15.63 
> > Contrast 6   4.167 7.269 8.264 8.848 10.29 16.34 
> > Contrast 7   3.948 6.381 7.956 8.244 9.824 15.58 
> > ## the various comparisons (cDNA2-cDNA1 etc)
> > Contrast 8   -5.773 -1.626 -0.9606 -1.074 -0.3976 1.715 
> > Contrast 9   -5.591 -1.427 -0.8069 -0.9218 -0.338 3.098 
> > Contrast 10   -4.329 -1.008 -0.4586 -0.5496 -0.04331 2.093 
> > Contrast 11   -4.698 -1.427 -0.7426 -0.7994 -0.159 2.631 
> > Contrast 12   -6.275 -1.857 -0.9168 -1.025 -0.05517 5.511 
> > Contrast 13   -6.746 -2.3 -1.536 -1.629 -0.9041 4.385 
> > Contrast 14   -5.327 -0.209 0.1646 0.1526 0.5908 5.481 
> > Contrast 15   -2.688 0.1018 0.4627 0.5249 0.9155 6.033 
> > Contrast 16   -3.462 -0.1604 0.2428 0.275 0.7153 6.032 
> > .......
> > 
> > Many of the medians are well below zero, and of course, the output 
> > from topTable is almost exclusively negative logFC.  This 
> doesnt make 
> > sense biologically.  Contrast 8 is cDNA2-cDNA1, with cDNA1 the "RNA 
> > reference" biologically speaking (cDNA2-7 are all single gene 
> > mutants). "Unfortunately", the median value for the cDNA1 coeff is 
> > quite a bit larger than the others, so I think this is 
> skewing most of 
> > the contrasts (i.e. I dont think there is a wholesale reduction of 
> > transcription in the other mutants).  What could/should I 
> do about it?  
> > I have contemplated normalizeQuantiles of the red channel after the 
> > Gquantile has been done (i.e. before fitting), but not sure whether 
> > that is a valid thing to do.  Would a single channel 
> analysis be more 
> > appropriate here?
> > 
> > Thoughts/suggestions greatly appreciated.
> > 
> > Cheers,
> > 
> > al
> > 
> > 
> >> sessionInfo()
> > R version 2.5.1 (2007-06-27)
> > i386-pc-mingw32 
> > 
> > locale:
> > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> > Kingdom.1252;LC_MONETARY=English_United
> > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
> > 
> > attached base packages:
> > [1] "tools"     "stats"     "graphics"  "grDevices" "utils"    
> > [6] "datasets"  "methods"   "base"     
> > 
> > other attached packages:
> > geneplotter    annotate     Biobase      gplots       gdata 
>      gtools 
> >    "1.14.0"    "1.14.1"    "1.14.1"     "2.3.2"     "2.3.1" 
>     "2.3.1" 
> >     lattice        MASS     statmod         sma       limma 
>       Hmisc 
> >    "0.16-2"    "7.2-35"     "1.3.0"    "0.5.15"    "2.10.0" 
>     "3.4-2" 
> > 
> 

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.