[BioC] Retrieving gene name where given genomic region is included.
Vincent Carey
stvjc at channing.harvard.edu
Tue Jan 19 04:20:15 CET 2010
This looks like a job for GenomicFeatures and IRanges. There are
various approaches, but this seems to help with your specific example.
Get your software into a shape like the one indicated in the
sessionInfo below, and then do
library(GenomicFeatures)
library(GenomicFeatures.Hsapiens.UCSC.hg18)
genes = geneHuman()
g10 = genes[genes$chrom=="chr10",]
dim(g10)
g10i = RangedData(IRanges(start=g10$txStart, end=g10$txEnd),
space="chr10", name=g10$name)
g10i
boel = IRanges(start=17317394, end=17317851) # could be richer
findOverlaps(boel, ranges(g10i)[["chr10"]])
g10i[298:307,]
kgn = unique(g10i[298:307,]$name)
mget(kgn, revmap(org.Hs.egUCSCKG))
to find
> get("7431", org.Hs.egGENENAME)
[1] "vimentin"
> sessionInfo()
R version 2.10.1 RC (2009-12-10 r50697)
i386-apple-darwin9.8.0
locale:
[1] C
attached base packages:
[1] stats graphics grDevices datasets tools utils methods
[8] base
other attached packages:
[1] org.Hs.eg.db_2.3.6
[2] RSQLite_0.7-3
[3] DBI_0.2-5
[4] AnnotationDbi_1.7.20
[5] Biobase_2.5.8
[6] IRanges_1.5.16
[7] GenomicFeatures.Hsapiens.UCSC.hg18_0.1.0
[8] GenomicFeatures_0.1.4
[9] rtracklayer_1.7.2
[10] RCurl_1.2-1
[11] bitops_1.0-4.1
[12] weaver_1.11.1
[13] codetools_0.2-2
[14] digest_0.4.1
loaded via a namespace (and not attached):
[1] BSgenome_1.15.3 Biostrings_2.15.5 MASS_7.3-4 XML_2.6-0
[5] annotate_1.23.4 globaltest_5.1.1 multtest_2.3.0 splines_2.10.1
[9] survival_2.35-7 xtable_1.5-5
On Mon, Jan 18, 2010 at 12:28 PM, Boel Brynedal <Boel.Brynedal at ki.se> wrote:
> Dear List,
>
> I have long lists of genomic regions (chr;start;end) where a given event
> has taken place. These regions can be an exon, an intronic region, or
> similar. Most (all) of these events have taken place within the
> boundaries of genes, and I would like to retrieve the gene names
> (ensemble ID).
>
> I've tried to use biomaRt:
>>
> getBM(attributes=c("ensembl_gene_id"),filter=c("chromosome_name","start","end"),
> values=list(10,17317394,17317851), mart=ensembl)
> [1] ensembl_gene_id
> <0 rows> (or 0-length row.names)
>
> But since no whole GENE is within these boundaries, I get nothing. i've
> also tried asking for "ensembl_exon_id" when looking at exon events (not
> all of them are of that kind however), and this generally results in a
> long list of exon IDs (because one exon can be part of several transcripts).
>
> I would appreciate any ideas of how this could be done in a better way.
>
> Thank you!
>
> Boel
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list