[BioC] Siggenes SAM analysis: log2 transformation and Understanding output

David Westergaard david at harsk.dk
Wed Feb 15 14:30:19 CET 2012


Hello,

I am currently working on a data set about kiwi consumption for my
bachelors project. The data is available at
http://www.ebi.ac.uk/arrayexpress/experiments/E-MEXP-2030

I'm abit confused as to how to interpret the output parameters,
specifically p0. I've run the following code:

dataset <- read.table("OAS_RMA.txt",header=TRUE)
controls <- cbind(dataset$CEL12.1,dataset$CEL13.1,dataset$CEL23.1,dataset$CEL25.1,dataset$CEL37.1,dataset$CEL59.1,dataset$CEL61.1,dataset$CEL78.1,dataset$CEL9.1,dataset$CEL92.1)
experiments <- cbind(dataset$CEL18.1,dataset$CEL21.1,dataset$CEL3.1,dataset$CEL31.1,dataset$CEL46.1,dataset$CEL50.1,dataset$CEL56.1,dataset$CEL57.1,dataset$CEL7.1)

library('siggenes')
datamatrix <- matrix(cbind(controls,experiments),ncol=19)
y <- rep(0,19)
y[11:19] <- 1
gene_names <- as.character(dataset$Hybridization.REF)
sam.obj = sam(datamatrix,y,gene.names=gene_names,rand=12345)

Output:
AM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances

 s0 = 0

 Number of permutations: 100

 MEAN number of falsely called variables is computed.

   Delta    p0    False Called    FDR cutlow cutup   j2    j1
1    0.1 0.634 28335.89  37013 0.4851 -1.058 0.354 9709 27372
2    0.5 0.634 11200.82  21273 0.3336 -2.271 0.910 2447 35850
3    0.9 0.634   249.38   1522 0.1038 -3.374 3.088  541 53695
4    1.3 0.634     9.67    134 0.0457 -4.402 5.577  127 54669
5    1.7 0.634     0.69     20 0.0219 -5.596   Inf   20 54676
6    2.1 0.634        0      1      0 -9.072   Inf    1 54676
7    2.5 0.634        0      1      0 -9.072   Inf    1 54676
8    2.9 0.634        0      1      0 -9.072   Inf    1 54676
9    3.3 0.634        0      1      0 -9.072   Inf    1 54676
10   3.7 0.634        0      0      0   -Inf   Inf    0 54676

I'm using the rand parameter because results seems to vary a bit. p0
is in this case 0.634, and I'm not sure how to interpret this. From
literature, this is described as "Prior probability that a gene is not
differentially expressed" - What does this exactly mean? Does this
imply, that there is a ~63% percent chance, that the genes in
question, are actually NOT differentially expressed?
I've also found some varying sources saying that it is a good idea to
log2 transform data before inputting into SAM. Does this still apply,
and if so, why?

Best Regards,

David Westergaard
Undergraduate student
Technical University of Denmark



More information about the Bioconductor mailing list