[BioC] biomaRt-chromosomal positions

Sun Aug 19 01:17:04 CEST 2007

(Forwarded from Steffen Durinck on his request:)
-------------------------------------------------

Dear Peter-Bram,

Ensembl currenly uses the transcript level as basic unit of annotation
and any feature on a smaller level cannot be retrieved without the
transcript. What do you mean with small tag features? An alternative is 
that you would use the getBM function and retrieve for example all 
entrezgene ids on chromosome I together with their start and end 
locations and then check these results to see if your start and end 
positions are inbetween these positions.  Would that help?

Try:

ensembl=useMart("ensembl", dataset="mmusculus_gene_ensembl")
positions = getBM(c("entrezgene","start_position","end_position"),
filters=c("chromosome_name","with_entrezgene"), values=list(1,TRUE),
mart=ensembl)

you'll get:

> positions[1:5,]
   entrezgene start_position end_position
1     497097        3206103      3661429
2      19888        4334224      4350473
3      20671        4481009      4486494
4      18777        4797943      4836817
5     670320        4870130      4870732

and then you just use a vectorized comparison on this and to see where 
your positions fit in.

Best regards,
Steffen

> -------- Messaggio Originale  --------
> Oggetto: [BioC] biomaRt-chromosomal positions
> Data: Thu, 16 Aug 2007 12:14:12 +0200
> Da: <P.A.C._t_Hoen at lumc.nl>
> A: <bioconductor at stat.math.ethz.ch>
>
> Dear BioC
>
> I would like to use biomaRt to get entrez gene (or other) identifiers
> for small tag sequences. I use the getFeature function for this. It
> seems that it will retrieve the identifiers only when the chromosomal
> region indicated spans at least the complete length of the transcript,
> but not if the indicated chromosomal region contains only part of the
> transcript sequence. Is there a way aroud here?
>
> Code and sessionInfo:
>
> library(RCurl)
> library(biomaRt)
> ensembl = useMart("ensembl")
> ensembl = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
> testgene <- getGene("66501", type = "entrezgene", mart = ensembl)
> testgene
> #  entrezgene  markersymbol
> #description chromosome_name band strand start_position end_position
> #1      66501 1700029H14Rik RIKEN cDNA 1700029H14 gene
> #[Source:MarkerSymbol;Acc:MGI:1913751]               8   A2     -1
> #13550710     13562382
> #     ensembl_gene_id ensembl_transcript_id
> #1 ENSMUSG00000031452    ENSMUST00000033830
>
> #this works fine:
> testfeatures = getFeature( type = "entrezgene", chromosome = "8", start
> = "13550710", end = "13562382",mart=ensembl)
> testfeatures
> #  chromosome_name start_position end_position entrezgene
> #1               8       13550710     13562382      66501
>
> #this does  not work anymore
> testfeatures = getFeature( type = "entrezgene", chromosome = "8", start
> = "13550711", end = "13562381",mart=ensembl)
> testfeatures
> #NULL
>
> #I would like to have a result from a small tag in a query like this:
> testfeatures = getFeature( type = "entrezgene", chromosome = "8", start
> = "13550741", end = "13550761",mart=ensembl)
>
>
> sessionInfo()
> ----------------------
> R version 2.5.0 (2007-04-23)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] "stats"     "graphics"  "grDevices" "datasets"  "utils"     "tcltk"
> "methods"   "base"
>
> other attached packages:
>   biomaRt    RCurl      XML     svIO   R2HTML   svMisc svSocket    svIDE
> "1.11.4"  "0.8-0"  "1.7-3"  "0.9-5"   "1.58"  "0.9-5"  "0.9-5"  "0.9-5"
>
>
>
> Cheers,
> Peter-Bram
>