[BioC] gDNA Cy3, cDNA Cy5

Wed Sep 26 22:46:36 CEST 2007

Dear Al,

what you have done looks very reasonable, but of course your outcome 
doesn't seem to be so. Issuse that one might want to investigate, if you 
haven't already, are:
- array quality
- variance of the data at the low end, and the background correction (if 
you have many very small or even negative raw intensities)
- do a single color analysis as if there were no Cy3
- compute log(Cy5/Cy3) within each array, then normalize the log-ratios 
between arrays, and here I would go successively from least aggressive 
to most aggressive (scaling, affine linear, a stiff loess-like smoother, 
quantiles) and only go to most aggressive if the data really ask for it.

  Best wishes
   Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber

> Hi,
> 
> I have a whole pile of 2-colour arrays done with Cy3 genomic DNA, and
> Cy5 cDNA.  There are 7 "test" samples, each done in triplicate against
> the genomic DNA (so 21 slides).  I have seen a previous posting from
> Jenny about how to analyse these kinds of hybes, and followed her
> suggestions.  
> 
> The scans are unprocessed BlueFuse, but bg has already been subtracted.
> 
>> red_green <-
> read.maimages(datafiles,source="bluefuse",columns=list(G="AMPCH2",R="AMP
> CH1",
> +                            weights="CONFIDENCE",quality="QUALITY"))
> 
> Although boxplots of the various slides for the two channels indicate a
> pretty reasonable degree of similarity within a channel, the two
> channels as a whole were quite difft:
> 
>> summary(as.vector(log2(red_green$R)))
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
>   1.157   7.381   8.700   9.016  10.600  16.420 
>> summary(as.vector(log2(red_green$G)))
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
>   1.820   7.666  10.380  10.020  12.470  16.370
> 
> Despite this, I proceeded anyway:
> 
>> ## as Cy3 is gDNA in the ref channel, don't do NWA
>> NBA <- normalizeBetweenArrays(red_green,method="Gquantile")
>>
>> ## get back the R and G values after Gq normalisation
>> NBA.RG.MA <- RG.MA(NBA)
> 
> All Cy3 distributions were now identical, as expected, and the Cy5
> channel had also been modified:
> 
>> summary(as.vector(log2(NBA.RG.MA$G)))
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
>   3.038   7.580  10.400  10.020  12.440  15.600 
>> summary(as.vector(log2(NBA.RG.MA$R)))
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
>  0.8795  7.3940  8.7480  9.0160 10.5700 17.4500 
> 
>> ## create a MAList to manipulate
>> NBA.fake <- NBA
>>
>> ## replace the M values with the log2(R) values
>> NBA.fake$M <- log2(NBA.RG.MA$R)
> 
> I then did fitting, using a design matrix of the ref="gDNA" kind, with a
> contrast matrix for the cDNAx vs cDNAy comparisons
> 
>> NBA.fakefit <- lmFit(NBA.fake$M, design=designMATRIX, method="ls")
>> NBA.fakecontrastsFit <- contrasts.fit(NBA.fakefit,contrasts)
>> eBNBA.fakecontrastsFit <- eBayes(NBA.fakecontrastsFit)
> 
> I then looked at the coefficients of the eBayesed contrast fit object:
> 
>> for(i in 1:length(colnames(contrasts)))
> + {
> + cat("Contrast",i," ",summary(eBNBA.fakecontrastsFit$coeff[,i]),"\n")
> + }
> 
> ## individual cDNAs (cDNA1, 2 etc)
>              min   25%   median mean 75% max
> Contrast 1   5.417 8.538 9.79 9.873 10.99 16.23 
> Contrast 2   4.426 6.867 8.623 8.798 10.46 15.58 
> Contrast 3   4.623 7.26 8.606 8.951 10.37 15.92 
> Contrast 4   5.175 7.626 8.853 9.323 10.72 16.52 
> Contrast 5   3.928 7.175 8.864 9.073 10.71 15.63 
> Contrast 6   4.167 7.269 8.264 8.848 10.29 16.34 
> Contrast 7   3.948 6.381 7.956 8.244 9.824 15.58 
> ## the various comparisons (cDNA2-cDNA1 etc)
> Contrast 8   -5.773 -1.626 -0.9606 -1.074 -0.3976 1.715 
> Contrast 9   -5.591 -1.427 -0.8069 -0.9218 -0.338 3.098 
> Contrast 10   -4.329 -1.008 -0.4586 -0.5496 -0.04331 2.093 
> Contrast 11   -4.698 -1.427 -0.7426 -0.7994 -0.159 2.631 
> Contrast 12   -6.275 -1.857 -0.9168 -1.025 -0.05517 5.511 
> Contrast 13   -6.746 -2.3 -1.536 -1.629 -0.9041 4.385 
> Contrast 14   -5.327 -0.209 0.1646 0.1526 0.5908 5.481 
> Contrast 15   -2.688 0.1018 0.4627 0.5249 0.9155 6.033 
> Contrast 16   -3.462 -0.1604 0.2428 0.275 0.7153 6.032 
> .......
> 
> Many of the medians are well below zero, and of course, the output from
> topTable is almost exclusively negative logFC.  This doesnt make sense
> biologically.  Contrast 8 is cDNA2-cDNA1, with cDNA1 the "RNA reference"
> biologically speaking (cDNA2-7 are all single gene mutants).
> "Unfortunately", the median value for the cDNA1 coeff is quite a bit
> larger than the others, so I think this is skewing most of the contrasts
> (i.e. I dont think there is a wholesale reduction of transcription in
> the other mutants).  What could/should I do about it?  I have
> contemplated normalizeQuantiles of the red channel after the Gquantile
> has been done (i.e. before fitting), but not sure whether that is a
> valid thing to do.  Would a single channel analysis be more appropriate
> here?
> 
> Thoughts/suggestions greatly appreciated.
> 
> Cheers,
> 
> al
> 
> 
>> sessionInfo()
> R version 2.5.1 (2007-06-27) 
> i386-pc-mingw32 
> 
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
> 
> attached base packages:
> [1] "tools"     "stats"     "graphics"  "grDevices" "utils"    
> [6] "datasets"  "methods"   "base"     
> 
> other attached packages:
> geneplotter    annotate     Biobase      gplots       gdata      gtools 
>    "1.14.0"    "1.14.1"    "1.14.1"     "2.3.2"     "2.3.1"     "2.3.1" 
>     lattice        MASS     statmod         sma       limma       Hmisc 
>    "0.16-2"    "7.2-35"     "1.3.0"    "0.5.15"    "2.10.0"     "3.4-2" 
>