[BioC] gDNA Cy3, cDNA Cy5
Al Ivens
alicat at sanger.ac.uk
Thu Sep 27 11:28:35 CEST 2007
Thanks Wolfgang, will explore some of your suggestions.
a
> -----Original Message-----
> From: Wolfgang Huber [mailto:huber at ebi.ac.uk]
> Sent: 26 September 2007 21:47
> To: Al Ivens
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] gDNA Cy3, cDNA Cy5
>
>
> Dear Al,
>
> what you have done looks very reasonable, but of course your outcome
> doesn't seem to be so. Issuse that one might want to
> investigate, if you
> haven't already, are:
> - array quality
> - variance of the data at the low end, and the background
> correction (if
> you have many very small or even negative raw intensities)
> - do a single color analysis as if there were no Cy3
> - compute log(Cy5/Cy3) within each array, then normalize the
> log-ratios
> between arrays, and here I would go successively from least
> aggressive
> to most aggressive (scaling, affine linear, a stiff
> loess-like smoother,
> quantiles) and only go to most aggressive if the data really
> ask for it.
>
> Best wishes
> Wolfgang
>
> ------------------------------------------------------------------
> Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
>
> > Hi,
> >
> > I have a whole pile of 2-colour arrays done with Cy3
> genomic DNA, and
> > Cy5 cDNA. There are 7 "test" samples, each done in
> triplicate against
> > the genomic DNA (so 21 slides). I have seen a previous
> posting from
> > Jenny about how to analyse these kinds of hybes, and followed her
> > suggestions.
> >
> > The scans are unprocessed BlueFuse, but bg has already been
> > subtracted.
> >
> >> red_green <-
> >
> read.maimages(datafiles,source="bluefuse",columns=list(G="AMPCH2",R="A
> > MP
> > CH1",
> > +
> weights="CONFIDENCE",quality="QUALITY"))
> >
> > Although boxplots of the various slides for the two
> channels indicate
> > a pretty reasonable degree of similarity within a channel, the two
> > channels as a whole were quite difft:
> >
> >> summary(as.vector(log2(red_green$R)))
> > Min. 1st Qu. Median Mean 3rd Qu. Max.
> > 1.157 7.381 8.700 9.016 10.600 16.420
> >> summary(as.vector(log2(red_green$G)))
> > Min. 1st Qu. Median Mean 3rd Qu. Max.
> > 1.820 7.666 10.380 10.020 12.470 16.370
> >
> > Despite this, I proceeded anyway:
> >
> >> ## as Cy3 is gDNA in the ref channel, don't do NWA
> >> NBA <- normalizeBetweenArrays(red_green,method="Gquantile")
> >>
> >> ## get back the R and G values after Gq normalisation NBA.RG.MA <-
> >> RG.MA(NBA)
> >
> > All Cy3 distributions were now identical, as expected, and the Cy5
> > channel had also been modified:
> >
> >> summary(as.vector(log2(NBA.RG.MA$G)))
> > Min. 1st Qu. Median Mean 3rd Qu. Max.
> > 3.038 7.580 10.400 10.020 12.440 15.600
> >> summary(as.vector(log2(NBA.RG.MA$R)))
> > Min. 1st Qu. Median Mean 3rd Qu. Max.
> > 0.8795 7.3940 8.7480 9.0160 10.5700 17.4500
> >
> >> ## create a MAList to manipulate
> >> NBA.fake <- NBA
> >>
> >> ## replace the M values with the log2(R) values
> >> NBA.fake$M <- log2(NBA.RG.MA$R)
> >
> > I then did fitting, using a design matrix of the ref="gDNA"
> kind, with
> > a contrast matrix for the cDNAx vs cDNAy comparisons
> >
> >> NBA.fakefit <- lmFit(NBA.fake$M, design=designMATRIX, method="ls")
> >> NBA.fakecontrastsFit <- contrasts.fit(NBA.fakefit,contrasts)
> >> eBNBA.fakecontrastsFit <- eBayes(NBA.fakecontrastsFit)
> >
> > I then looked at the coefficients of the eBayesed contrast
> fit object:
> >
> >> for(i in 1:length(colnames(contrasts)))
> > + {
> > + cat("Contrast",i,"
> ",summary(eBNBA.fakecontrastsFit$coeff[,i]),"\n")
> > + }
> >
> > ## individual cDNAs (cDNA1, 2 etc)
> > min 25% median mean 75% max
> > Contrast 1 5.417 8.538 9.79 9.873 10.99 16.23
> > Contrast 2 4.426 6.867 8.623 8.798 10.46 15.58
> > Contrast 3 4.623 7.26 8.606 8.951 10.37 15.92
> > Contrast 4 5.175 7.626 8.853 9.323 10.72 16.52
> > Contrast 5 3.928 7.175 8.864 9.073 10.71 15.63
> > Contrast 6 4.167 7.269 8.264 8.848 10.29 16.34
> > Contrast 7 3.948 6.381 7.956 8.244 9.824 15.58
> > ## the various comparisons (cDNA2-cDNA1 etc)
> > Contrast 8 -5.773 -1.626 -0.9606 -1.074 -0.3976 1.715
> > Contrast 9 -5.591 -1.427 -0.8069 -0.9218 -0.338 3.098
> > Contrast 10 -4.329 -1.008 -0.4586 -0.5496 -0.04331 2.093
> > Contrast 11 -4.698 -1.427 -0.7426 -0.7994 -0.159 2.631
> > Contrast 12 -6.275 -1.857 -0.9168 -1.025 -0.05517 5.511
> > Contrast 13 -6.746 -2.3 -1.536 -1.629 -0.9041 4.385
> > Contrast 14 -5.327 -0.209 0.1646 0.1526 0.5908 5.481
> > Contrast 15 -2.688 0.1018 0.4627 0.5249 0.9155 6.033
> > Contrast 16 -3.462 -0.1604 0.2428 0.275 0.7153 6.032
> > .......
> >
> > Many of the medians are well below zero, and of course, the output
> > from topTable is almost exclusively negative logFC. This
> doesnt make
> > sense biologically. Contrast 8 is cDNA2-cDNA1, with cDNA1 the "RNA
> > reference" biologically speaking (cDNA2-7 are all single gene
> > mutants). "Unfortunately", the median value for the cDNA1 coeff is
> > quite a bit larger than the others, so I think this is
> skewing most of
> > the contrasts (i.e. I dont think there is a wholesale reduction of
> > transcription in the other mutants). What could/should I
> do about it?
> > I have contemplated normalizeQuantiles of the red channel after the
> > Gquantile has been done (i.e. before fitting), but not sure whether
> > that is a valid thing to do. Would a single channel
> analysis be more
> > appropriate here?
> >
> > Thoughts/suggestions greatly appreciated.
> >
> > Cheers,
> >
> > al
> >
> >
> >> sessionInfo()
> > R version 2.5.1 (2007-06-27)
> > i386-pc-mingw32
> >
> > locale:
> > LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> > Kingdom.1252;LC_MONETARY=English_United
> > Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
> >
> > attached base packages:
> > [1] "tools" "stats" "graphics" "grDevices" "utils"
> > [6] "datasets" "methods" "base"
> >
> > other attached packages:
> > geneplotter annotate Biobase gplots gdata
> gtools
> > "1.14.0" "1.14.1" "1.14.1" "2.3.2" "2.3.1"
> "2.3.1"
> > lattice MASS statmod sma limma
> Hmisc
> > "0.16-2" "7.2-35" "1.3.0" "0.5.15" "2.10.0"
> "3.4-2"
> >
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioconductor
mailing list