[R] Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges
Michael Lawrence
lawrence.michael at gene.com
Mon Apr 11 16:57:15 CEST 2016
For the sake of prosterity, this question was asked and answered here:
https://support.bioconductor.org/p/80448
On Tue, Apr 5, 2016 at 10:27 AM, 何尧 <heyao at pku.edu.cn> wrote:
> I do have a bunch of genes ( nearly ~50000) from the whole genome, which read in genomic ranges
>
> A range(gene) can be seem as an observation has three columns chromosome, start and end, like that
>
> seqnames start end width strand
>
> gene1 chr1 1 5 5 +
>
> gene2 chr1 10 15 6 +
>
> gene3 chr1 12 17 6 +
>
> gene4 chr1 20 25 6 +
>
> gene5 chr1 30 40 11 +
>
> I just wondering is there an efficient way to find overlapped, upstream and downstream genes for each gene in the granges
>
> For example, assuming all_genes_gr is a ~50000 genes genomic range, the result I want like belows:
>
> gene_nameupstream_genedownstream_geneoverlapped_gene
> gene1NAgene2NA
> gene2gene1gene4gene3
> gene3gene1gene4gene2
> gene4gene3gene5NA
>
> Currently , the strategy I use is like that,
> library(GenomicRanges)
> find_overlapped_gene <- function(idx, all_genes_gr) {
> #cat(idx, "\n")
> curr_gene <- all_genes_gr[idx]
> other_genes <- all_genes_gr[-idx]
> n <- countOverlaps(curr_gene, other_genes)
> gene <- subsetByOverlaps(curr_gene, other_genes)
> return(list(n, gene))
> }
>
> system.time(lapply(1:100, function(idx) find_overlapped_gene(idx, all_genes_gr)))
> However, for 100 genes, it use nearly ~8s by system.time().That means if I had 50000 genes, nearly one hour for just find overlapped gene.
>
> I am just wondering any algorithm or strategy to do that efficiently, perhaps 50000 genes in ~10min or even less
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list