[BioC] how to use biomaRt to retrieve probe's hgnc_symbol ... from a table of probe's chromosome coordinates?

James W. MacDonald jmacdon at uw.edu
Fri May 11 23:07:09 CEST 2012


Hi Ying,

I figured you could just extend your values list to include vectors of 
chr, start, end, like thus:

 > values
$chromosome_name
    1    2    3    4    5    6    7    8    9   10
"19" "12"  "8"  "8"  "8"  "8" "14"  "3"  "2" "17"

$start
           1           2           3           4           5           6
  "58858174"   "9220304"  "18027971"  "18067618"  "18079177"  "18248755"
           7           8           9          10
  "95078714" "151531861" "219128853"  "74449433"

$end
           1           2           3           4           5           6
  "58864865"   "9268558"  "18081197"  "18081197"  "18081197"  "18258723"
           7           8           9          10
  "95090389" "151546277" "219134893"  "74466198"

But if you then use your call to getBM(), you get this:

 > dim(genesymbol)
[1] 16376     6

and further exploration indicates that what you get are genes that 
fulfill the criteria of all three list items, so you get anything from 
chr19 that is between 58858174 and 219134893 (and the same for all of 
the other chromosomes). So not that helpful. Now there might be a nice 
way to get what you want directly from biomaRt, but I can't figure out 
how to do so. But the GenomicRanges package gives us a way out.

 > gr <- GRanges(seqnames = genesymbol$chrom, ranges = 
IRanges(start=genesymbol$start, end = genesymbol$end))
 > gr2 <- GRanges(seqnames = values[[1]], ranges = IRanges(start = 
as.numeric(values[[2]]), end = as.numeric(values[[3]])))

So here I have made a GRanges object based on all the sequences we get 
back from biomaRt (gr) and a GRanges object based on the original 10 
positions I queried on (gr2). We can now create an indicator that tells 
us which of the biomaRt sequences are in the original 10:

 > ind <- gr %in% gr2

and subset genesymbol using that indicator:

 > genesymbol <- genesymbol[ind,]

Best,

Jim

On 5/11/2012 2:59 PM, ying chen wrote:
> Hi, I have a list of probes and a table with porbes' chromosome coordinates. I want to retrieve probe's gene symbol, gene's chromosome coordinate,..... I can use biomaRt to retrieve info one probe at a time. For example:>  genesymbol<-getBM(attributes=c('entrezgene','hgnc_symbol','description','chromosome_name','start_position','end_position'),filters=c('chromosome_name','start','end'),values=list(4,40354254,40354313),mart=ensembl)
>> genesymbol
>    entrezgene hgnc_symbol
> 1      55584      CHRNA9
>                                                                description
> 1 cholinergic receptor, nicotinic, alpha 9 [Source:HGNC Symbol;Acc:14079]
>    chromosome_name start_position end_position
> 1               4       40337346     40357234
> I just wonder if there is a quick way to retrieve the info for all probes I have at once? I have chromosome coordinates of my probes in a tab-delimited txt file like: 4       40354254     40354213
> 5 234567        3450404006 4736473      4897789............ Any suggestion? Thanks a lot for the help! Ying 		 	   		
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list