[BioC] Siggenes SAM analysis: log2 transformation and Understanding output

Wed Feb 15 14:30:19 CET 2012

Hello,

I am currently working on a data set about kiwi consumption for my
bachelors project. The data is available at
http://www.ebi.ac.uk/arrayexpress/experiments/E-MEXP-2030

I'm abit confused as to how to interpret the output parameters,
specifically p0. I've run the following code:

dataset <- read.table("OAS_RMA.txt",header=TRUE)
controls <- cbind(dataset$CEL12.1,dataset$CEL13.1,dataset$CEL23.1,dataset$CEL25.1,dataset$CEL37.1,dataset$CEL59.1,dataset$CEL61.1,dataset$CEL78.1,dataset$CEL9.1,dataset$CEL92.1)
experiments <- cbind(dataset$CEL18.1,dataset$CEL21.1,dataset$CEL3.1,dataset$CEL31.1,dataset$CEL46.1,dataset$CEL50.1,dataset$CEL56.1,dataset$CEL57.1,dataset$CEL7.1)

library('siggenes')
datamatrix <- matrix(cbind(controls,experiments),ncol=19)
y <- rep(0,19)
y[11:19] <- 1
gene_names <- as.character(dataset$Hybridization.REF)
sam.obj = sam(datamatrix,y,gene.names=gene_names,rand=12345)

Output:
AM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances

 s0 = 0

 Number of permutations: 100

 MEAN number of falsely called variables is computed.

   Delta    p0    False Called    FDR cutlow cutup   j2    j1
1    0.1 0.634 28335.89  37013 0.4851 -1.058 0.354 9709 27372
2    0.5 0.634 11200.82  21273 0.3336 -2.271 0.910 2447 35850
3    0.9 0.634   249.38   1522 0.1038 -3.374 3.088  541 53695
4    1.3 0.634     9.67    134 0.0457 -4.402 5.577  127 54669
5    1.7 0.634     0.69     20 0.0219 -5.596   Inf   20 54676
6    2.1 0.634        0      1      0 -9.072   Inf    1 54676
7    2.5 0.634        0      1      0 -9.072   Inf    1 54676
8    2.9 0.634        0      1      0 -9.072   Inf    1 54676
9    3.3 0.634        0      1      0 -9.072   Inf    1 54676
10   3.7 0.634        0      0      0   -Inf   Inf    0 54676

I'm using the rand parameter because results seems to vary a bit. p0
is in this case 0.634, and I'm not sure how to interpret this. From
literature, this is described as "Prior probability that a gene is not
differentially expressed" - What does this exactly mean? Does this
imply, that there is a ~63% percent chance, that the genes in
question, are actually NOT differentially expressed?
I've also found some varying sources saying that it is a good idea to
log2 transform data before inputting into SAM. Does this still apply,
and if so, why?

Best Regards,

David Westergaard
Undergraduate student
Technical University of Denmark