[BioC] how to use biomaRt to retrieve probe's hgnc_symbol ... from a table of probe's chromosome coordinates?
James W. MacDonald
jmacdon at uw.edu
Fri May 11 23:07:09 CEST 2012
Hi Ying,
I figured you could just extend your values list to include vectors of
chr, start, end, like thus:
> values
$chromosome_name
1 2 3 4 5 6 7 8 9 10
"19" "12" "8" "8" "8" "8" "14" "3" "2" "17"
$start
1 2 3 4 5 6
"58858174" "9220304" "18027971" "18067618" "18079177" "18248755"
7 8 9 10
"95078714" "151531861" "219128853" "74449433"
$end
1 2 3 4 5 6
"58864865" "9268558" "18081197" "18081197" "18081197" "18258723"
7 8 9 10
"95090389" "151546277" "219134893" "74466198"
But if you then use your call to getBM(), you get this:
> dim(genesymbol)
[1] 16376 6
and further exploration indicates that what you get are genes that
fulfill the criteria of all three list items, so you get anything from
chr19 that is between 58858174 and 219134893 (and the same for all of
the other chromosomes). So not that helpful. Now there might be a nice
way to get what you want directly from biomaRt, but I can't figure out
how to do so. But the GenomicRanges package gives us a way out.
> gr <- GRanges(seqnames = genesymbol$chrom, ranges =
IRanges(start=genesymbol$start, end = genesymbol$end))
> gr2 <- GRanges(seqnames = values[[1]], ranges = IRanges(start =
as.numeric(values[[2]]), end = as.numeric(values[[3]])))
So here I have made a GRanges object based on all the sequences we get
back from biomaRt (gr) and a GRanges object based on the original 10
positions I queried on (gr2). We can now create an indicator that tells
us which of the biomaRt sequences are in the original 10:
> ind <- gr %in% gr2
and subset genesymbol using that indicator:
> genesymbol <- genesymbol[ind,]
Best,
Jim
On 5/11/2012 2:59 PM, ying chen wrote:
> Hi, I have a list of probes and a table with porbes' chromosome coordinates. I want to retrieve probe's gene symbol, gene's chromosome coordinate,..... I can use biomaRt to retrieve info one probe at a time. For example:> genesymbol<-getBM(attributes=c('entrezgene','hgnc_symbol','description','chromosome_name','start_position','end_position'),filters=c('chromosome_name','start','end'),values=list(4,40354254,40354313),mart=ensembl)
>> genesymbol
> entrezgene hgnc_symbol
> 1 55584 CHRNA9
> description
> 1 cholinergic receptor, nicotinic, alpha 9 [Source:HGNC Symbol;Acc:14079]
> chromosome_name start_position end_position
> 1 4 40337346 40357234
> I just wonder if there is a quick way to retrieve the info for all probes I have at once? I have chromosome coordinates of my probes in a tab-delimited txt file like: 4 40354254 40354213
> 5 234567 3450404006 4736473 4897789............ Any suggestion? Thanks a lot for the help! Ying
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list