[BioC] Statistical comparison of low replicate affy data

Matthew Hannah Hannah at mpimp-golm.mpg.de
Thu Feb 19 09:26:40 MET 2004


Thanks for the responses.

I used SAM by calling it on the exprsSet at was suggested. (It also worked
on table made from a text file output from rma). Anyway when I did the 
analysis I got some 'interesting' results (see bottom of mail). Basically
unpaired the data gave me 2500 genes delta > 2 with a FDR of 0.003, and 
paired (the U/T reps were conducted on the same plant batch) 4000 > 2 with 
FDR of 0.001. However if I permutate the input data (see ex. 3 below) then
it just returns zeros. I guess this could be due to the coarseness of the
comparisons as suggested but I'll just give a few more details of my data
to see what people think.

Basically the data is highly reproducable between biological replicates but
there is a big 'treatment' effect (this is what we want!?). For example Rsq
for within replicate x-y scatter plots of GCRMA data are 0.97 - 0.99, whilst
for the 3 U-T comparisons the values are 0.92-0.93.

So as I interpret things then as soon as you permutate the data you get very
different data sets being mixed, massively increasing the variance and so few
significant changes are detected, hence a very low FDR. If you input the data
already permutated then some of the permutations of this data have loads of
sig changes (as they represent the correct data order) and so FDR is huge and
SAM returns all 0's.

So where does this leave us, not using a test because the data is too 'good'
seems abit strange. But equallly not knowing how reliable it is is also not
good.

Also on a more general note, when you get to this stage with so many changes
(1 rep U-T comparison with GCRMA - 5000 1.5x and 2500 2x changes) is the data
violating the assumption for the normalisation that most genes remain unchanged?

I'll investigate the limma package.

Thanks
Matt

> cl = c(0,0,0,1,1,1)
> rmasam <- sam(rma, cl)
SAM Analysis for the two class unpaired case. 
 
s0 = 0.0695  (The 15 % quantile of the s values.) 
 
SAM Analysis for a set of delta: 
   delta    p0 false called   FDR
1    0.2 0.723  9638  13270 0.525
2    0.4 0.723  3951   9543 0.299
3    0.6 0.723  1634   7480 0.158
4    0.8 0.723   643   6068 0.077
5    1.0 0.723   286   5155 0.040
6    1.2 0.723   131   4394 0.022
7    1.4 0.723    64   3764 0.012
8    1.6 0.723    35   3259 0.008
9    1.8 0.723    18   2846 0.005
10   2.0 0.723    10   2478 0.003

> cl = c(1,2,3,-1,-2,-3)
> rmasamp <- sam(rma, cl)
SAM Analysis for the two class paired case. 
 
s0 = 0.0733  (The 45 % quantile of the s values.) 
 
SAM Analysis for a set of delta: 
   delta   p0 false called   FDR
1    0.2 0.52  9631  17275 0.290
2    0.4 0.52  2378  13276 0.093
3    0.6 0.52   695  10684 0.034
4    0.8 0.52   257   8922 0.015
5    1.0 0.52   127   7664 0.009
6    1.2 0.52    53   6575 0.004
7    1.4 0.52    28   5652 0.003
8    1.6 0.52    14   4985 0.001
9    1.8 0.52     9   4370 0.001
10   2.0 0.52     5   3867 0.001

> cl = c(0,1,0,1,0,1)
> rmasamperm <- sam(rma, cl)
SAM Analysis for the two class unpaired case. 
 
s0 = 0.0549  (The 5 % quantile of the s values.) 
 
SAM Analysis for a set of delta: 
   delta p0 false called FDR
1    0.2  1     0      0   0
2    0.4  1     0      0   0
3    0.6  1     0      0   0
4    0.8  1     0      0   0
5    1.0  1     0      0   0
6    1.2  1     0      0   0
7    1.4  1     0      0   0
8    1.6  1     0      0   0
9    1.8  1     0      0   0
10   2.0  1     0      0   0



More information about the Bioconductor mailing list