[BioC] Retrieving genes by their genome locations in biomart

Steffen Durinck SDurinck at lbl.gov
Mon Apr 14 17:07:33 CEST 2008


Hi Sergii,

You only give a start position, and biomaRt needs a start and end position and a chromosome name if you want to retrieve all genes located in a specific region. 
An example would be:

getBM(c("entrezgene","chromosome_name","start_position","end_position","strand"), filters=c("chromosome_name","start","end", "with_entrezgene"), values = list(2, 200000,600000, TRUE), mart=mart)

 You would have to do this query for each position separately which would mean constructing a loop but this is not advisable when using biomaRt.  An alternative is that you retrieve all entrezgene ids and the gene start and stop positions and chromosomes from all genes and then check in this output where your start positions fit. The query you need to do for this is:

genes = getBM(c("entrezgene", "ensembl_gene_id","chromosome_name","start_position","end_position","strand"), filters = c("chromosome_name","with_entrezgene"), values=list(c(1:22,"X","Y"),TRUE), mart=mart)

best,
Steffen

Hello All,

I was wandering how biomart retrieves data for the gene given
chromosome_name and position, i.e. in the query for mouse genome I have
809 positions, but the returned entrezids have different length.

res1=getBM(attributes="entrezgene", filters =
c("start","chromosome_name"), values = list(positions,Chr_positions),
mart=ensembl) 

dim(res1)

[1] 2711    1

> > length(unique(res1[,1]))

[1] 2205

Does this mean that for the probes located in the midlevel between genes
two nearest are returned?

Thanks!

Best

Sergii Ivakhno



More information about the Bioconductor mailing list