[BioC] Error in RankProd package
Marcus Davy
mdavy at hortresearch.co.nz
Sun Mar 25 23:15:11 CEST 2007
Error in RankProd package, specifically related to the RP and topGene
functions and the way RP, handles missing values for an entire row of
observations for a gene. Filtering out rows of missing values will get
around this.
The help(RP) documentation provides the following information
about NA handling;
na.rm: if 'FALSE' (default), the NA value will not be used in
computing rank. If 'TRUE', the missing values will be
replaced by the gene-wise mean of the non-missing values.
Gene with all values missing will be assigned "NA"
So, in the case where an entire row of observations for a gene are NAs the
rank product should be assigned NAs. To illustrate the errors I have
generated a 10 by 4 matrix from a normal distribution with two rows of
missing values;
library(RankProd)
set.seed(1)
nrows <- 10
ncols <- 4
X <- round(matrix(rnorm(nrows * ncols), nc=ncols),2)
X[2:3,] <- NA
X
RP.out <- RP(X, cl=rep(1,ncols), na.rm=TRUE)
RP.out$RPrank
RP.out$RPrank[2:3,] # I think the missing values should be ranked at the end
# in one case and at the beginning in the other. This
# can be controlled in the function 'rank' with the
# na.last argument
RP.out$Orirank[["class1 > class 2"]] # Additional space in list name
RankProd1(X, logged=TRUE, num.class=1, rev.sorting=TRUE)$rank.all
RP.out$Orirank[["class1 < class2"]]
RankProd1(X, logged=TRUE, num.class=1, rev.sorting=FALSE)$rank.all
RP.out$pval
tG <- topGene(RP.out, cutoff = 0.1, method="pval", logged=FALSE,
logbase=2, gene.names=rownames(X)) # cutoff=0.1 for illustration
# First list from topGene is incorrect, filters genes with p-values > 0.1
>TG$Table1 # Outputs:
gene.index RP/Rsum FC:(class1/class2) pfp P.value
4 4 2.5149 -0.6625 0.2025 0.0405
2 2 NA NaN 1.1089 0.9980
5 5 3.7417 0.1725 0.5050 0.2020
3 3 NA NaN 0.9980 0.9980
6 6 2.1147 -0.3325 0.1700 0.0170
# List ok
TG$Table2
# A row of missing values should not be highly ranked in the first three
genes
>topGene(RP.out, logged=FALSE, num.gene=3,
logbase=2, gene.names=rownames(X)) # Outputs;
Table1: Genes called significant under class1 < class2
Table2: Genes called significant under class1 > class2
$Table1
gene.index RP/Rsum FC:(class1/class2) pfp P.value
4 4 2.5149 -0.6625 0.2025 0.0405
2 2 NA NaN 1.1089 0.9980
5 5 3.7417 0.1725 0.5050 0.2020
$Table2
gene.index RP/Rsum FC:(class1/class2) pfp P.value
1 1 1.6266 0.79 0.035 0.0035
7 7 5.1800 -0.02 0.555 0.3885
3 3 NA NaN 0.998 0.9980
> sessionInfo()
R version 2.4.1 (2006-12-18)
powerpc-apple-darwin8.8.0
locale:
C
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
[7] "base"
other attached packages:
limma RankProd
"2.9.8" "2.6.0"
This was also tested on the BioC 2.0 development release of RankProd 2.7.0
and gives the same results.
Marcus
______________________________________________________
The contents of this e-mail are privileged and/or confidenti...{{dropped}}
More information about the Bioconductor
mailing list