[BioC] strange results with edgeR::goodTuring
    Francois Pepin 
    francois.pepin at sequentainc.com
       
    Mon Aug 27 21:00:19 CEST 2012
    
    
  
Hi everyone,
I'm trying to use the goodTuring function in edgeR to estimate what kind of pseudocounts to use and I'm getting strange results with small number of categories:
x<-c(312,14491,16401,65124,129797,323321,366051,368599,405261,604962)
y<- goodTuring(x)
y
$count
 [1]    312  14491  16401  65124 129797 323321 366051 368599 405261 604962
$proportion
 [1] 0 0 0 0 0 0 0 0 0 1
$P0
[1] 0
$n0
[1] 0
If I'm understanding this properly, y$proportion is telling me that I should expect all my counts to fall under the last category, which does not make sense. I would expect something pretty close to x/sum(x) instead.
This is a bit of a toy example and I'm mostly interested in cases where I have more categories but it would be nice if this could work in all cases.
sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] edgeR_2.6.9   limma_3.12.1  dataframe_2.5
Thanks,
François
    
    
More information about the Bioconductor
mailing list