[BioC] Statistical comparison of low replicate affy data
Naomi Altman
naomi at stat.psu.edu
Mon Mar 1 02:13:49 MET 2004
If you really have good replicability, you should be able to use a
gene-by-gene 2-sample t-test.
--Naomi
At 03:26 AM 2/19/2004, Matthew Hannah wrote:
>Thanks for the responses.
>
>I used SAM by calling it on the exprsSet at was suggested. (It also worked
>on table made from a text file output from rma). Anyway when I did the
>analysis I got some 'interesting' results (see bottom of mail). Basically
>unpaired the data gave me 2500 genes delta > 2 with a FDR of 0.003, and
>paired (the U/T reps were conducted on the same plant batch) 4000 > 2 with
>FDR of 0.001. However if I permutate the input data (see ex. 3 below) then
>it just returns zeros. I guess this could be due to the coarseness of the
>comparisons as suggested but I'll just give a few more details of my data
>to see what people think.
>
>Basically the data is highly reproducable between biological replicates but
>there is a big 'treatment' effect (this is what we want!?). For example Rsq
>for within replicate x-y scatter plots of GCRMA data are 0.97 - 0.99, whilst
>for the 3 U-T comparisons the values are 0.92-0.93.
>
>So as I interpret things then as soon as you permutate the data you get very
>different data sets being mixed, massively increasing the variance and so few
>significant changes are detected, hence a very low FDR. If you input the data
>already permutated then some of the permutations of this data have loads of
>sig changes (as they represent the correct data order) and so FDR is huge and
>SAM returns all 0's.
>
>So where does this leave us, not using a test because the data is too 'good'
>seems abit strange. But equallly not knowing how reliable it is is also not
>good.
>
>Also on a more general note, when you get to this stage with so many changes
>(1 rep U-T comparison with GCRMA - 5000 1.5x and 2500 2x changes) is the data
>violating the assumption for the normalisation that most genes remain
>unchanged?
>
>I'll investigate the limma package.
>
>Thanks
>Matt
>
> > cl = c(0,0,0,1,1,1)
> > rmasam <- sam(rma, cl)
>SAM Analysis for the two class unpaired case.
>
>s0 = 0.0695 (The 15 % quantile of the s values.)
>
>SAM Analysis for a set of delta:
> delta p0 false called FDR
>1 0.2 0.723 9638 13270 0.525
>2 0.4 0.723 3951 9543 0.299
>3 0.6 0.723 1634 7480 0.158
>4 0.8 0.723 643 6068 0.077
>5 1.0 0.723 286 5155 0.040
>6 1.2 0.723 131 4394 0.022
>7 1.4 0.723 64 3764 0.012
>8 1.6 0.723 35 3259 0.008
>9 1.8 0.723 18 2846 0.005
>10 2.0 0.723 10 2478 0.003
>
> > cl = c(1,2,3,-1,-2,-3)
> > rmasamp <- sam(rma, cl)
>SAM Analysis for the two class paired case.
>
>s0 = 0.0733 (The 45 % quantile of the s values.)
>
>SAM Analysis for a set of delta:
> delta p0 false called FDR
>1 0.2 0.52 9631 17275 0.290
>2 0.4 0.52 2378 13276 0.093
>3 0.6 0.52 695 10684 0.034
>4 0.8 0.52 257 8922 0.015
>5 1.0 0.52 127 7664 0.009
>6 1.2 0.52 53 6575 0.004
>7 1.4 0.52 28 5652 0.003
>8 1.6 0.52 14 4985 0.001
>9 1.8 0.52 9 4370 0.001
>10 2.0 0.52 5 3867 0.001
>
> > cl = c(0,1,0,1,0,1)
> > rmasamperm <- sam(rma, cl)
>SAM Analysis for the two class unpaired case.
>
>s0 = 0.0549 (The 5 % quantile of the s values.)
>
>SAM Analysis for a set of delta:
> delta p0 false called FDR
>1 0.2 1 0 0 0
>2 0.4 1 0 0 0
>3 0.6 1 0 0 0
>4 0.8 1 0 0 0
>5 1.0 1 0 0 0
>6 1.2 1 0 0 0
>7 1.4 1 0 0 0
>8 1.6 1 0 0 0
>9 1.8 1 0 0 0
>10 2.0 1 0 0 0
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor
mailing list