[Bioc-devel] plyranges group_by
Bhagwat, Aditya
Ad|ty@@Bh@gw@t @end|ng |rom mp|-bn@mpg@de
Wed Oct 16 11:48:25 CEST 2019
Hi Stuart, Michael,
Your plyranges package is really cool - now I am using it for left joining GRanges (I am facing a minor issue there<https://support.bioconductor.org/p/125623/>, but that is not the topic of this email - I have been asked by Lori not to double-post :-)).
This email is about the plyranges functionality for grouping GRanges.
That is cool, but I found it to be not so performant for large numbers of ranges.
My R session hangs when I do:
bedfile <- paste0('https://gitlab.gwdg.de/loosolab/software/multicrispr/wikis',
'/uploads/a51e98516c1e6b71441f5b5a5f741fa1/SRF.bed')
srfranges <- rtracklayer::import.bed(bedfile, genome = 'mm10')
txdb <- TxDb.Mmusculus.UCSC.mm10.ensGene::TxDb.Mmusculus.UCSC.mm10.ensGene
generanges <- GenomicFeatures::genes(txdb)
annotatedsrf <- plyranges::join_overlap_left(srfranges, generanges)
plyranges::group_by(annotatedsrf, seqnames, start, end, strand)
For my purposes, I worked around it by performing a groupby in data.table:
data.table::as.data.table(annotatedsrf)[
!is.na(gene_id),
gene_id := paste0(gene_id, collapse = ';'),
by = c('seqnames', 'start', 'end', 'strand'))
And was wondering, in general, whether it would be useful to have a data.table-based backend for plyranges::groupby()
And, whether all of this is actually a on-issue due to my improper use of plyranges::group_by properly.
Thank you for feebdack :-)
Aditya
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list