[BioC] Is there a package or a way to convert "probesets" to "genes"

Marc Carlson mcarlson at fhcrc.org
Tue Nov 17 19:05:15 CET 2009


Hi Cheng-Yuan,

That expression just removes any unmapped entrez IDs (which will be
given as an NA), and then also removes any that are duplicated from the
list.  However, this will only work for that case where your probeset
IDs are each mapped to one entrez gene ID each (a many to one
relationship between probes and genes).  And by default, the annotation
packages will only display data for probesets that map like this.  And
for most probes on most platforms, this will be  perfectly adequate. 
But, if you really want to explore the many to many mappings between
some genes and their more ambiguously designed probesets, then you need
to look at the help page for the toggleProbes() method in
AnnotationDbi.  Using this method can allow you to expose these
relationships so that you can see the more troublesome probes.  Once you
have done that, you will be able to see that some probes map to several
different genes (a many to many relationship).

library(AnnotationDbi)
?toggleProbes

Should help a bit if you really want to go there.  Let me know if you
have further questions,


  Marc



Cheng-Yuan Kao wrote:
> Hi,
>
> We have C. elegans expression data set.
> Do you know what exactly [!is.na(entrezIDs) & !duplicated(entrezIDs)] does?
>
> Thanks.
>
> Cheng-Yuan
>
> On Tue, Nov 17, 2009 at 3:34 AM, Yuan Hao <yuan.hao at ucd.ie> wrote:
>
>   
>> Hi Richie,
>>
>> I am not sure which data set and annotation package you are working on,
>> taking hgu133plus2 chip for example. if you just want to get Entrez genes
>> corresponding to your probesets, you can simply do:
>>
>> library("hgu133plus2.db")
>> entrezIDs<-unlist(mget(probesets, hgu133plus2ENTREZID))
>> entrezIDs<-entrezIDs[!is.na(entrezIDs) & !duplicated(entrezIDs)]
>>
>> Hope it is what you want.
>>
>> Cheers,
>> Yuan
>>
>>
>>
>>
>> On 17 Nov 2009, at 08:15, Tobias Straub wrote:
>>
>>  hi richie,
>>     
>>> one easy way to handle the multiple probesets per gene problem is to keep
>>> only the one probeset with the highest variance across replicates. the
>>> 'nsFilter' function in the 'genefilter' package provides this operation for
>>> ExpressionSet objects.
>>> using this filter approach you might of course miss some differentially
>>> regulated splicing events.
>>>
>>> best regards
>>> tobias
>>>
>>>
>>> On Nov 17, 2009, at 12:01 AM, Cheng-Yuan Kao wrote:
>>>
>>>  Hi, there,
>>>       
>>>> I have a question regarding Affy chip data.
>>>>
>>>> We did many expression arrays and used LIMMA to get the differentially
>>>> expressed "genes" (control vs treatment).
>>>>
>>>> However I found that some probesets have multiple genes according to Affy
>>>> annotation file.
>>>>
>>>> On the other hand, multiple probesets could match to the same gene.
>>>> Even more, some probesets matched to the same gene could be regulated in
>>>> different way.
>>>>
>>>> So actually what LIMMA gave us is differentially expressed "probesets".
>>>>
>>>> Say we have 500 probesets up-regulated but we indeed want to know how
>>>> many
>>>> "genes" are up-regulated.
>>>>
>>>> I don't know how to reasonably convert the probesets to genes due to the
>>>> non-one-to-one relationship.
>>>>
>>>> What's the convention in the microarray field?
>>>>
>>>> Any suggestion would be greatly appreciated.
>>>>
>>>>
>>>> Richie
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>         
>>> ----------------------------------------------------------------------
>>> Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, München D
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>       
>>     
>
> 	[[alternative HTML version deleted]]
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list