[BioC] markersymol filter problem

Marc Carlson mcarlson at fhcrc.org
Thu Nov 1 16:51:39 CET 2007


Siarhei Manakou wrote:
> Thanks a lot to everybody who replied! I see that this is a problem
> for now, but it is being dealt with, which is good.
>
> Cheers,
> Sergei
>
> Steffen wrote:
>> Hi Sergei,
>>
>> The Ensembl team has been contacted about the loss of markersymbol as
>> filter and attribute in the latest release of the Ensembl BioMart.
>> It looks like this was not intended and they will put the
>> markersymbol attribute/filter back. They are planning to put them in
>> next Wednesday and then the Ensembl webteam will have to propagate
>> this fix across the machines.
>>
>> We'll have to use Marc's workaround in the meantime.
>>
>> Cheers,
>> Steffen
>>
>> Marc Carlson wrote:
>>> Siarhei Manakou wrote:
>>>  
>>>> Hello,
>>>>
>>>> I am accessing biomarts through bioconductor. It used to be
>>>> possible to use list of gene symbols and retrieve pretty much what
>>>> you wanted using getBM() function and specifying "markersymbol" as
>>>> filters. Now this thing doesn't work anymore. I know that in order
>>>> to retrieve gene symbols from, let's say, ensembl geneIDs - you can
>>>> specify "external_gene_id", as attributes and you will get your
>>>> list of gene symbols. However things don't work other way around
>>>> and putting "external_gene_id" as filters doesn't work. So do you
>>>> know how it is possible to use gene symbols these days in order to
>>>> retrieve information from biomarts?
>>>>
>>>> thanks,
>>>> Sergei
>>>>
>>>>
>>>>       
>>> It occurs to me that you might be able to use the latest bioconductor
>>> annotation packages to work around this.  You could use an organism
>>> based package to map from gene symbols over to entrez gene IDs for
>>> instance.  Then you could just feed those IDs into biomart instead as a
>>> work around.
>>>
>>> So something like:
>>>
>>> library("org.Hs.eg.db")
>>> foo <- c("GLP-1","HOTAIR","SCARNA27")
>>> mget(foo,envir=org.Hs.egSYMBOL2EG)
>>>
>>>
>>>     Marc
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>   
>
I wanted to add one more thing that you might find useful.  When dealing
with symbols, one thing that can burn people sometimes is the use of
non-standard symbols.  So in the example above, suppose that you had
"FLJ39511" instead of the more standard "GLP-1".  Well in that case we
also have you covered.  We now provide a new mapping called ALIAS2EG
that will try to map any symbol over to the appropriate Entrez Gene ID. 
So the code would look like this instead:

library("org.Hs.eg.db")
foo <- c("FLJ39511","HOTAIR","SCARNA27")
mget(foo,envir=org.Hs.egALIAS2EG)

And if you run that you will get the same ID for "FLJ39511" that you
would get for "GLP-1" (whichever one you have listed) because they are
just different symbols for the same thing.  So if you are certain that
you have all standard symbols, then use SYMBOL2EG, otherwise, you can
use the newer ALIAS2EG. 

Note that unlike the pair of standard maps "SYMBOL" and "SYMBOL2EG",
there is no inverse of the map ALIAS2EG, since it's a special mapping
that is just for the special case where you find yourself with weird 
non-standard symbols and want to map back over to something standardized.

I know that you might not have needed this particular bit of exposition,
but I decided to mention it anyways because this is a common problem and
it might be helpful to someone else who is looking at this thread in the
future.


    Marc



More information about the Bioconductor mailing list