[BioC] easyRNAseq remove overlapping features

Nicolas Delhomme delhomme at embl.de
Wed Jan 9 15:35:53 CET 2013


Hi Vincent,

I'm currently doing a lot of refactoring on easyRNASeq and I should come with an example on how to achieve that soon in the developer version (1.5.x). I might propagate it to the stable version as a new function (1.4.x) as well, provided that it does not affect the stability in any way.

I'm planning to add FPKM and I can obviously add TPM. These modification would probably only occur in the developer version (1.5.x) though. I can let you know once these are done, as well as how to get the developer version of Bioconductor if you're not familiar with that.

Cheers,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------





On Jan 9, 2013, at 3:23 PM, Vincent Schulz wrote:

> Hi Nico,
> 
> I would like to use easyRNAseq to count reads for RNA-seq.  The vignette says that "The ideal solution is to provide an annotation object that contains no overlapping features. The disjoin function from the IRanges package offers a way to achieve this."  I do not have much experience with IRanges, etc, and would be grateful for any pointers on how to do this, since it was not obvious to me.  I would like to not remove the genes that overlap, but instead remove the regions of the genes that overlap, leaving any unique regions.  And one additional request--would it be possible to have easyRNAseq have the option to calculate TPM as well as RPKM (using the non-overlapping gene length) ?  The reference for TPM is
> http://www.ncbi.nlm.nih.gov/pubmed/22872506
> TPM/RPKM would be useful for heatmaps and other display purposes.
> 
> Thanks,
> 
> Vince
> 
> 
> library(easyRNASeq)
> library(RnaSeqTutorial)
> library(BSgenome.Dmelanogaster.UCSC.dm3)
> 
> count.table <- easyRNASeq(system.file(
> "extdata",
> package="RnaSeqTutorial"),
> organism="Dmelanogaster",
> readLength=36L,
> annotationMethod="gff",
> annotationFile=system.file(
> "extdata",
> "annot.gff",
> package="RnaSeqTutorial"),
> gapped=TRUE,
> count="exons",
> filenames="gapped.bam")
> 
> > sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C                 LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
> [1] BSgenome.Dmelanogaster.UCSC.dm3_1.3.19
> [2] RnaSeqTutorial_0.0.11
> [3] easyRNASeq_1.4.2
> [4] ShortRead_1.16.3
> [5] latticeExtra_0.6-24
> [6] RColorBrewer_1.0-5
> [7] Rsamtools_1.10.2
> [8] DESeq_1.10.1
> [9] lattice_0.20-13
> [10] locfit_1.5-8
> [11] BSgenome_1.26.1
> [12] GenomicRanges_1.10.5
> [13] Biostrings_2.26.2
> [14] IRanges_1.16.4
> [15] edgeR_3.0.8
> [16] limma_3.14.3
> [17] biomaRt_2.14.0
> [18] Biobase_2.18.0
> [19] genomeIntervals_1.14.0
> [20] BiocGenerics_0.4.0
> [21] intervals_0.13.3
> [22] BiocInstaller_1.8.3
> 
> loaded via a namespace (and not attached):
> [1] annotate_1.36.0      AnnotationDbi_1.20.3 bitops_1.0-5
> [4] DBI_0.2-5            genefilter_1.40.0    geneplotter_1.36.0
> [7] grid_2.15.2          hwriter_1.3          RCurl_1.95-3
> [10] RSQLite_0.11.2       splines_2.15.2       stats4_2.15.2
> [13] survival_2.37-2      tools_2.15.2         XML_3.95-0.1
> [16] xtable_1.7-0         zlibbioc_1.4.0



More information about the Bioconductor mailing list