[BioC] Statistical comparison of low replicate affy data
Matthew Hannah
Hannah at mpimp-golm.mpg.de
Thu Feb 19 09:26:40 MET 2004
Thanks for the responses.
I used SAM by calling it on the exprsSet at was suggested. (It also worked
on table made from a text file output from rma). Anyway when I did the
analysis I got some 'interesting' results (see bottom of mail). Basically
unpaired the data gave me 2500 genes delta > 2 with a FDR of 0.003, and
paired (the U/T reps were conducted on the same plant batch) 4000 > 2 with
FDR of 0.001. However if I permutate the input data (see ex. 3 below) then
it just returns zeros. I guess this could be due to the coarseness of the
comparisons as suggested but I'll just give a few more details of my data
to see what people think.
Basically the data is highly reproducable between biological replicates but
there is a big 'treatment' effect (this is what we want!?). For example Rsq
for within replicate x-y scatter plots of GCRMA data are 0.97 - 0.99, whilst
for the 3 U-T comparisons the values are 0.92-0.93.
So as I interpret things then as soon as you permutate the data you get very
different data sets being mixed, massively increasing the variance and so few
significant changes are detected, hence a very low FDR. If you input the data
already permutated then some of the permutations of this data have loads of
sig changes (as they represent the correct data order) and so FDR is huge and
SAM returns all 0's.
So where does this leave us, not using a test because the data is too 'good'
seems abit strange. But equallly not knowing how reliable it is is also not
good.
Also on a more general note, when you get to this stage with so many changes
(1 rep U-T comparison with GCRMA - 5000 1.5x and 2500 2x changes) is the data
violating the assumption for the normalisation that most genes remain unchanged?
I'll investigate the limma package.
Thanks
Matt
> cl = c(0,0,0,1,1,1)
> rmasam <- sam(rma, cl)
SAM Analysis for the two class unpaired case.
s0 = 0.0695 (The 15 % quantile of the s values.)
SAM Analysis for a set of delta:
delta p0 false called FDR
1 0.2 0.723 9638 13270 0.525
2 0.4 0.723 3951 9543 0.299
3 0.6 0.723 1634 7480 0.158
4 0.8 0.723 643 6068 0.077
5 1.0 0.723 286 5155 0.040
6 1.2 0.723 131 4394 0.022
7 1.4 0.723 64 3764 0.012
8 1.6 0.723 35 3259 0.008
9 1.8 0.723 18 2846 0.005
10 2.0 0.723 10 2478 0.003
> cl = c(1,2,3,-1,-2,-3)
> rmasamp <- sam(rma, cl)
SAM Analysis for the two class paired case.
s0 = 0.0733 (The 45 % quantile of the s values.)
SAM Analysis for a set of delta:
delta p0 false called FDR
1 0.2 0.52 9631 17275 0.290
2 0.4 0.52 2378 13276 0.093
3 0.6 0.52 695 10684 0.034
4 0.8 0.52 257 8922 0.015
5 1.0 0.52 127 7664 0.009
6 1.2 0.52 53 6575 0.004
7 1.4 0.52 28 5652 0.003
8 1.6 0.52 14 4985 0.001
9 1.8 0.52 9 4370 0.001
10 2.0 0.52 5 3867 0.001
> cl = c(0,1,0,1,0,1)
> rmasamperm <- sam(rma, cl)
SAM Analysis for the two class unpaired case.
s0 = 0.0549 (The 5 % quantile of the s values.)
SAM Analysis for a set of delta:
delta p0 false called FDR
1 0.2 1 0 0 0
2 0.4 1 0 0 0
3 0.6 1 0 0 0
4 0.8 1 0 0 0
5 1.0 1 0 0 0
6 1.2 1 0 0 0
7 1.4 1 0 0 0
8 1.6 1 0 0 0
9 1.8 1 0 0 0
10 2.0 1 0 0 0
More information about the Bioconductor
mailing list