[BioC] How to handle the case a Affymetrix probe set ID mapped to multiple genes?

Yuan Hao yuan.x.hao at gmail.com
Tue Jul 30 17:07:33 CEST 2013


GSEA mostly uses entrez gene ids during test. Most "_x" probe sets eventually won't have corresponding entrez ids mapped to, which would be automatically excluded before the test, so they shouldn't be a problem for you. 

Cheers,
Yuan

On Jul 30, 2013, at 9:43 AM, Levi Waldron <lwaldron.research at gmail.com> wrote:

> On Tue, Jul 30, 2013 at 9:14 AM, Feng Tian <fengtian at bu.edu> wrote:
> 
>> Hi Levi,
>> 
>> Thanks for your reply very much.
>> My purpose is to do GSEA analysis. So is there a general way to handle
>> these "_x" probes?
>> 
>> Regards,
>> Feng
>> 
> 
> After mapping, I would just drop anything with "///" for GSEA analysis. I
> suppose you could also choose one representative, or if you are using the
> Broad's tool, provide probe sets and let it deal with the mapping (although
> I don't know how it deals with non-specific probe sets).  I doubt such
> probe sets will have much effect on GSEA results, since most of those genes
> will have a more specific probeset available.  E.g.:
> 
>> library(hgu133plus2.db)
>> x=as.character(hgu133plus2SYMBOL)
>> length(x)
> [1] 41293   #probe sets
>> length(unique(x))
> [1] 19944   #gene symbols
>> ind=grep("_x", names(x))
>> summary(x[ind] %in% x[-ind])
>   Mode   FALSE    TRUE    NA's
> logical     623    2469       0
>> 
> 
> So for hgu133plus2 you would lose 623 out of 19944 genes - IMO if that
> changes your GSEA in an important way, it probably wasn't a robust result
> anyways.
> 
> 
> 
> 
> 
> 
>> 
>> On Tue, Jul 30, 2013 at 9:00 AM, Levi Waldron <lwaldron.research at gmail.com
>>> wrote:
>> 
>>> Hi Feng,
>>> 
>>> probe sets labelled with "_x" cross-hybridize to multiple genes:
>>> 
>>> http://www.affymetrix.com/support/help/faqs/mouse_430/faq_8.jsp
>>> 
>>> Genecards gives more detail for this probe set:
>>> 
>>> 
>>> http://genecards.weizmann.ac.il/cgi-bin/geneannot/GA_search.pl?keyword_type=probe_set_id&array=HG-U133&target=genecards&keyword=200012_x_at
>>> 
>>> How to handle such a case depends on how interested you are in that probe
>>> set; at the extremes you could ignore it, or follow up with PCR to
>>> establish which transcript you are observing.
>>> 
>>> -Levi
>>> 
>>> 
>>> On Mon, Jul 29, 2013 at 6:06 PM, Feng Tian <fengtian at bu.edu> wrote:
>>> 
>>>> Dear all,
>>>> 
>>>> In the Affymetrix annotation file, I find that some probe set ID are
>>> mapped
>>>> to multiple genes separated by '///', such as 200012_x_at is mapped
>>>> to RPL21P16///RPL21P119///RPL21. How to handle this case?
>>>> 
>>>> Thank you!
>>>> 
>>>> Feng
>>>> 
>>>>        [[alternative HTML version deleted]]
>>>> 
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Levi Waldron
>>> Post-doctoral fellow
>>> Department of Biostatistics, Harvard School of Public Health
>>> Department of Biostatistics and Computational Biology, Dana-Farber Cancer
>>> Institute
>>> Building 1, room 412C
>>> 655 Huntington Avenue
>>> Boston, Massachusetts 02115
>>> mobile: (617) 851-6849
>>> fax: (617) 432-5619
>>> http://www.hsph.harvard.edu/research/levi-waldron/
>>> 
>>>        [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>> 
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list