[BioC] biomaRt-chromosomal positions
Wolfgang Huber
huber at ebi.ac.uk
Sun Aug 19 01:17:04 CEST 2007
(Forwarded from Steffen Durinck on his request:)
-------------------------------------------------
Dear Peter-Bram,
Ensembl currenly uses the transcript level as basic unit of annotation
and any feature on a smaller level cannot be retrieved without the
transcript. What do you mean with small tag features? An alternative is
that you would use the getBM function and retrieve for example all
entrezgene ids on chromosome I together with their start and end
locations and then check these results to see if your start and end
positions are inbetween these positions. Would that help?
Try:
ensembl=useMart("ensembl", dataset="mmusculus_gene_ensembl")
positions = getBM(c("entrezgene","start_position","end_position"),
filters=c("chromosome_name","with_entrezgene"), values=list(1,TRUE),
mart=ensembl)
you'll get:
> positions[1:5,]
entrezgene start_position end_position
1 497097 3206103 3661429
2 19888 4334224 4350473
3 20671 4481009 4486494
4 18777 4797943 4836817
5 670320 4870130 4870732
and then you just use a vectorized comparison on this and to see where
your positions fit in.
Best regards,
Steffen
> -------- Messaggio Originale --------
> Oggetto: [BioC] biomaRt-chromosomal positions
> Data: Thu, 16 Aug 2007 12:14:12 +0200
> Da: <P.A.C._t_Hoen at lumc.nl>
> A: <bioconductor at stat.math.ethz.ch>
>
> Dear BioC
>
> I would like to use biomaRt to get entrez gene (or other) identifiers
> for small tag sequences. I use the getFeature function for this. It
> seems that it will retrieve the identifiers only when the chromosomal
> region indicated spans at least the complete length of the transcript,
> but not if the indicated chromosomal region contains only part of the
> transcript sequence. Is there a way aroud here?
>
> Code and sessionInfo:
>
> library(RCurl)
> library(biomaRt)
> ensembl = useMart("ensembl")
> ensembl = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
> testgene <- getGene("66501", type = "entrezgene", mart = ensembl)
> testgene
> # entrezgene markersymbol
> #description chromosome_name band strand start_position end_position
> #1 66501 1700029H14Rik RIKEN cDNA 1700029H14 gene
> #[Source:MarkerSymbol;Acc:MGI:1913751] 8 A2 -1
> #13550710 13562382
> # ensembl_gene_id ensembl_transcript_id
> #1 ENSMUSG00000031452 ENSMUST00000033830
>
> #this works fine:
> testfeatures = getFeature( type = "entrezgene", chromosome = "8", start
> = "13550710", end = "13562382",mart=ensembl)
> testfeatures
> # chromosome_name start_position end_position entrezgene
> #1 8 13550710 13562382 66501
>
> #this does not work anymore
> testfeatures = getFeature( type = "entrezgene", chromosome = "8", start
> = "13550711", end = "13562381",mart=ensembl)
> testfeatures
> #NULL
>
> #I would like to have a result from a small tag in a query like this:
> testfeatures = getFeature( type = "entrezgene", chromosome = "8", start
> = "13550741", end = "13550761",mart=ensembl)
>
>
> sessionInfo()
> ----------------------
> R version 2.5.0 (2007-04-23)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] "stats" "graphics" "grDevices" "datasets" "utils" "tcltk"
> "methods" "base"
>
> other attached packages:
> biomaRt RCurl XML svIO R2HTML svMisc svSocket svIDE
> "1.11.4" "0.8-0" "1.7-3" "0.9-5" "1.58" "0.9-5" "0.9-5" "0.9-5"
>
>
>
> Cheers,
> Peter-Bram
>
More information about the Bioconductor
mailing list