[BioC] Possible problem with featureCounts() and scipen

Wed Jun 18 02:34:35 CEST 2014

Dear Oscar,

I have fixed the bug and committed to bioc (1.14.2). 

Also, you'd better refine your commands like below:

regions <- data.frame(GeneID="G1", Chr="chr2", Start=202000000, End=202002100, Strand="+")
featureCounts("MyReadsfile.bam", annot.ext=regions, ignoreDup=T)

Best wishes,

Wei

On Jun 17, 2014, at 9:00 PM, Oscar Rueda wrote:

> Dear list,
> I've noticed a problem counting reads using the Rsubread package. I can not include a reproducible example for lack of a bam file, but I hope the following code explains the problem:
> 
> Suppose I have this genomic region:
> 
>> regions <- data.frame(GeneID="G1", Chr="chr2", Start=202000000, End=202002100, Strand=1)
> 
> Then I want to count the reads in this region:
> 
>> featureCounts("MyReadsfile.bam", annot.ext=regions, annot.inbuilt="hg19", ignoreDup=T)
> 
> I get a huge number of reads, but if I look at the output, I see that the annotation is wrong:
> 
> $counts
>      X.MyReadsfile.bam
> G1                                                                       715291
> 
> $annotation
>  GeneID  Chr Start       End Strand    Length
> 1  Error chr2     2 202002100      + 202002099
> 
> That is, the start is 2 instead of 202000000. If I print my regions I see
> 
>> regions
>  GeneID  Chr    Start       End Strand
> 1  Error chr2 2.02e+08 202002100      1
> 
> That is, the scientific notation is not taken properly.
> I can fix this doing
>> par(scipen=10)
> 
> But I wanted to ask if anyone has noticed this behaviour and if this is expected by the function.
> 
> Thanks a lot for your comments,
> Oscar
> 
> Oscar M. Rueda, PhD.
> Postdoctoral Research Fellow, Caldas Lab, Breast Cancer Functional
> Genomics.
> University of Cambridge. Cancer Research UK Cambridge Institute.
> Li Ka Shing Centre, Robinson Way.
> Cambridge CB2 0RE
> England
> 
> 
>> sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
> 
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
> [1] Rsubread_1.14.1         GenomicAlignments_1.0.1 BSgenome_1.32.0
> [4] Rsamtools_1.16.1        Biostrings_2.32.0       XVector_0.4.0
> [7] GenomicRanges_1.16.3    GenomeInfoDb_1.0.2      IRanges_1.22.9
> [10] BiocGenerics_0.10.0
> 
> loaded via a namespace (and not attached):
> [1] BatchJobs_1.2      BBmisc_1.6         BiocParallel_0.6.1 bitops_1.0-6
> [5] brew_1.0-6         codetools_0.2-8    DBI_0.2-7          digest_0.6.4
> [9] fail_1.2           foreach_1.4.2      iterators_1.0.7    plyr_1.8.1
> [13] Rcpp_0.11.2        RSQLite_0.11.4     sendmailR_1.1-2    stats4_3.1.0
> [17] stringr_0.6.2      tools_3.1.0        zlibbioc_1.10.0
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}