[BioC] strange results with edgeR::goodTuring
Francois Pepin
francois.pepin at sequentainc.com
Mon Aug 27 21:00:19 CEST 2012
Hi everyone,
I'm trying to use the goodTuring function in edgeR to estimate what kind of pseudocounts to use and I'm getting strange results with small number of categories:
x<-c(312,14491,16401,65124,129797,323321,366051,368599,405261,604962)
y<- goodTuring(x)
y
$count
[1] 312 14491 16401 65124 129797 323321 366051 368599 405261 604962
$proportion
[1] 0 0 0 0 0 0 0 0 0 1
$P0
[1] 0
$n0
[1] 0
If I'm understanding this properly, y$proportion is telling me that I should expect all my counts to fall under the last category, which does not make sense. I would expect something pretty close to x/sum(x) instead.
This is a bit of a toy example and I'm mostly interested in cases where I have more categories but it would be nice if this could work in all cases.
sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_2.6.9 limma_3.12.1 dataframe_2.5
Thanks,
François
More information about the Bioconductor
mailing list