[BioC] gDNA Cy3, cDNA Cy5

Al Ivens alicat at sanger.ac.uk
Wed Sep 26 13:44:29 CEST 2007


Hi,

I have a whole pile of 2-colour arrays done with Cy3 genomic DNA, and
Cy5 cDNA.  There are 7 "test" samples, each done in triplicate against
the genomic DNA (so 21 slides).  I have seen a previous posting from
Jenny about how to analyse these kinds of hybes, and followed her
suggestions.  

The scans are unprocessed BlueFuse, but bg has already been subtracted.

> red_green <-
read.maimages(datafiles,source="bluefuse",columns=list(G="AMPCH2",R="AMP
CH1",
+                            weights="CONFIDENCE",quality="QUALITY"))

Although boxplots of the various slides for the two channels indicate a
pretty reasonable degree of similarity within a channel, the two
channels as a whole were quite difft:

> summary(as.vector(log2(red_green$R)))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.157   7.381   8.700   9.016  10.600  16.420 
> summary(as.vector(log2(red_green$G)))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.820   7.666  10.380  10.020  12.470  16.370

Despite this, I proceeded anyway:

> ## as Cy3 is gDNA in the ref channel, don't do NWA
> NBA <- normalizeBetweenArrays(red_green,method="Gquantile")
>
> ## get back the R and G values after Gq normalisation
> NBA.RG.MA <- RG.MA(NBA)

All Cy3 distributions were now identical, as expected, and the Cy5
channel had also been modified:

> summary(as.vector(log2(NBA.RG.MA$G)))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.038   7.580  10.400  10.020  12.440  15.600 
> summary(as.vector(log2(NBA.RG.MA$R)))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.8795  7.3940  8.7480  9.0160 10.5700 17.4500 

> ## create a MAList to manipulate
> NBA.fake <- NBA
>
> ## replace the M values with the log2(R) values
> NBA.fake$M <- log2(NBA.RG.MA$R)

I then did fitting, using a design matrix of the ref="gDNA" kind, with a
contrast matrix for the cDNAx vs cDNAy comparisons

> NBA.fakefit <- lmFit(NBA.fake$M, design=designMATRIX, method="ls")
> NBA.fakecontrastsFit <- contrasts.fit(NBA.fakefit,contrasts)
> eBNBA.fakecontrastsFit <- eBayes(NBA.fakecontrastsFit)

I then looked at the coefficients of the eBayesed contrast fit object:

> for(i in 1:length(colnames(contrasts)))
+ {
+ cat("Contrast",i," ",summary(eBNBA.fakecontrastsFit$coeff[,i]),"\n")
+ }

## individual cDNAs (cDNA1, 2 etc)
             min   25%   median mean 75% max
Contrast 1   5.417 8.538 9.79 9.873 10.99 16.23 
Contrast 2   4.426 6.867 8.623 8.798 10.46 15.58 
Contrast 3   4.623 7.26 8.606 8.951 10.37 15.92 
Contrast 4   5.175 7.626 8.853 9.323 10.72 16.52 
Contrast 5   3.928 7.175 8.864 9.073 10.71 15.63 
Contrast 6   4.167 7.269 8.264 8.848 10.29 16.34 
Contrast 7   3.948 6.381 7.956 8.244 9.824 15.58 
## the various comparisons (cDNA2-cDNA1 etc)
Contrast 8   -5.773 -1.626 -0.9606 -1.074 -0.3976 1.715 
Contrast 9   -5.591 -1.427 -0.8069 -0.9218 -0.338 3.098 
Contrast 10   -4.329 -1.008 -0.4586 -0.5496 -0.04331 2.093 
Contrast 11   -4.698 -1.427 -0.7426 -0.7994 -0.159 2.631 
Contrast 12   -6.275 -1.857 -0.9168 -1.025 -0.05517 5.511 
Contrast 13   -6.746 -2.3 -1.536 -1.629 -0.9041 4.385 
Contrast 14   -5.327 -0.209 0.1646 0.1526 0.5908 5.481 
Contrast 15   -2.688 0.1018 0.4627 0.5249 0.9155 6.033 
Contrast 16   -3.462 -0.1604 0.2428 0.275 0.7153 6.032 
.......

Many of the medians are well below zero, and of course, the output from
topTable is almost exclusively negative logFC.  This doesnt make sense
biologically.  Contrast 8 is cDNA2-cDNA1, with cDNA1 the "RNA reference"
biologically speaking (cDNA2-7 are all single gene mutants).
"Unfortunately", the median value for the cDNA1 coeff is quite a bit
larger than the others, so I think this is skewing most of the contrasts
(i.e. I dont think there is a wholesale reduction of transcription in
the other mutants).  What could/should I do about it?  I have
contemplated normalizeQuantiles of the red channel after the Gquantile
has been done (i.e. before fitting), but not sure whether that is a
valid thing to do.  Would a single channel analysis be more appropriate
here?

Thoughts/suggestions greatly appreciated.

Cheers,

al


> sessionInfo()
R version 2.5.1 (2007-06-27) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] "tools"     "stats"     "graphics"  "grDevices" "utils"    
[6] "datasets"  "methods"   "base"     

other attached packages:
geneplotter    annotate     Biobase      gplots       gdata      gtools 
   "1.14.0"    "1.14.1"    "1.14.1"     "2.3.2"     "2.3.1"     "2.3.1" 
    lattice        MASS     statmod         sma       limma       Hmisc 
   "0.16-2"    "7.2-35"     "1.3.0"    "0.5.15"    "2.10.0"     "3.4-2" 
> 




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.



More information about the Bioconductor mailing list