[Bioc-sig-seq] unexpected genes names list using getBM{biomaRt}

Mon Dec 7 15:57:55 CET 2009

On Mon, Dec 7, 2009 at 9:52 AM, Ramzi TEMANNI <ramzi.temanni at gmail.com> wrote:
> Hi,
> I want to extract the gene names knowing the chromosome and the position for
> each genes:
>> t.cpd[1:10,1:2]
>      CHR.M1 POS.M1
>  [1,] "12"   "140059033"
>  [2,] "19"   "164634640"
>  [3,] "10"   "32347784"
>  [4,] "11"   "30576841"
>  [5,] "2"    "86479831"
>  [6,] "12"   "237019866"
>  [7,] "4"    "76487174"
>  [8,] "20"   "136121868"
>  [9,] "2"    "6255547"
> [10,] "1"    "67658137"
>
> i use the following commands:
> library(biomaRt)
> mart = useMart("ensembl")
> ensembl = useDataset("hsapiens_gene_ensembl", mart = mart)
> gn.m1<-getBM(attributes= c("hgnc_symbol"),
>       filters=c("chromosome_name","start"),
>       values=list(t.cpd[1:10,1],t.cpd[1:10,2]), mart=ensembl)
>
> I'm expecting having a list of 10 genes names, but instead i get 8652 genes:
> hgnc_symbol
> 1      OR2M1P
> 2      OR2L1P
> 3   HSD17B7P1
> 4     OR14L1P
> 5       OR2W5
> 6       VN1R5
> ......
> 8649        WFS1
> 8650    SNORD73A
> 8651     SNORA24
> 8652     SNORA26
>
> Did I miss something ?

Your query will pull out all genes that are in the chromosomal regions
STARTING with your start position.  In other words, you are getting
all of the genes that are to the right of the start positions.  You
probably want to specify an end position as well as a start position
for your query.

Sean