[BioC] Error in RankProd package

Sun Mar 25 23:15:11 CEST 2007

Error in RankProd package, specifically related to the RP and topGene
functions and the way RP, handles missing values for an entire row of
observations for a gene. Filtering out rows of missing values will get
around this.

The help(RP) documentation provides the following information
about NA handling;

   na.rm: if 'FALSE' (default), the NA value will not be used in
          computing rank. If 'TRUE', the missing  values will be
          replaced by the gene-wise mean of the non-missing values.
          Gene with all values missing  will be assigned "NA"

So, in the case where an entire row of observations for a gene are NAs the
rank product should be assigned NAs. To illustrate the errors I have
generated a 10 by 4 matrix from a normal distribution with two rows of
missing values;

library(RankProd)
set.seed(1)

nrows <- 10
ncols <- 4
X <- round(matrix(rnorm(nrows * ncols), nc=ncols),2)

X[2:3,] <- NA
X

RP.out <- RP(X, cl=rep(1,ncols), na.rm=TRUE)

RP.out$RPrank

RP.out$RPrank[2:3,] # I think the missing values should be ranked at the end
                    # in one case and at the beginning in the other. This
                    # can be controlled in the function 'rank' with the
                    # na.last argument

RP.out$Orirank[["class1 > class 2"]] # Additional space in list name
RankProd1(X, logged=TRUE, num.class=1, rev.sorting=TRUE)$rank.all

RP.out$Orirank[["class1 < class2"]]
RankProd1(X, logged=TRUE, num.class=1, rev.sorting=FALSE)$rank.all

RP.out$pval

tG <- topGene(RP.out, cutoff = 0.1, method="pval", logged=FALSE,
        logbase=2, gene.names=rownames(X)) # cutoff=0.1 for illustration

# First list from topGene is incorrect, filters genes with p-values > 0.1
>TG$Table1   # Outputs:

  gene.index RP/Rsum FC:(class1/class2)    pfp P.value
4          4  2.5149            -0.6625 0.2025  0.0405
2          2      NA                NaN 1.1089  0.9980
5          5  3.7417             0.1725 0.5050  0.2020
3          3      NA                NaN 0.9980  0.9980
6          6  2.1147            -0.3325 0.1700  0.0170

# List ok
TG$Table2

# A row of missing values should not be highly ranked in the first three
genes
>topGene(RP.out, logged=FALSE, num.gene=3,
        logbase=2, gene.names=rownames(X))   # Outputs;

Table1: Genes called significant under class1 < class2

Table2: Genes called significant under class1 > class2

$Table1
  gene.index RP/Rsum FC:(class1/class2)    pfp P.value
4          4  2.5149            -0.6625 0.2025  0.0405
2          2      NA                NaN 1.1089  0.9980
5          5  3.7417             0.1725 0.5050  0.2020

$Table2
  gene.index RP/Rsum FC:(class1/class2)   pfp P.value
1          1  1.6266               0.79 0.035  0.0035
7          7  5.1800              -0.02 0.555  0.3885
3          3      NA                NaN 0.998  0.9980

> sessionInfo()
R version 2.4.1 (2006-12-18)
powerpc-apple-darwin8.8.0

locale:
C

attached base packages:
[1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
[7] "base"     

other attached packages:
   limma RankProd 
 "2.9.8"  "2.6.0" 

This was also tested on the BioC 2.0 development release of RankProd 2.7.0
and gives the same results.

Marcus

______________________________________________________

The contents of this e-mail are privileged and/or confidenti...{{dropped}}