[BioC] probe expression profile to gene expression profile

Christos Hatzis christos at nuverabio.com
Mon Apr 2 20:55:28 CEST 2007


To add to Sean's comments, in general probe sets should be considered as
independent entities (not necessarily as multiple/replicate measurements of
the same entity, i.e. the underlying gene). So the question of which
probeset-to-gene map should be used is rather ill posed. 

The answer will generally depend on the objective of the study.  For
example, if the objective is to develop a predictive (classification) model,
probe sets are the independent predictors and the question of gene-average
expression is not really relevant.  As another example, if the objective is
to compare the reproducibility of gene expression between two or more
platforms, then it is imperative to match data at the probe set level to
allow for a meaningful evaluation.  Different probe sets map to different
parts of the gene and thus tend to behave independently, in many cases
driven by allelic effects in the study population.

Finally, if the objective is to understand the biology behind differentially
expressed genes, then it is important to first double-check the validity of
the "official" probe to gene mappings.  Then spend some time to try to
understand the implications of the relative position of the probe set on the
gene sequence.  

The following two articles are informative in this respect:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=Ab
stractPlus&list_uids=16284200&query_hl=15&itool=pubmed_docsum

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=Ab
stractPlus&list_uids=17224057&query_hl=13&itool=pubmed_docsum


So I would argue that this is more of a biology problem rather than a
bioinformatics problem and thus not amenable to an automated solution.

-Christos

Christos Hatzis, Ph.D.
Nuvera Biosciences, Inc.
400 West Cummings Park
Suite 5350
Woburn, MA 01801
Tel: 781-938-3830
www.nuverabio.com
 


> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch 
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of 
> Sean Davis
> Sent: Monday, April 02, 2007 2:24 PM
> To: Weiwei Shi
> Cc: bioconductor
> Subject: Re: [BioC] probe expression profile to gene 
> expression profile
> 
> Weiwei Shi wrote:
> > Dear All:
> >
> > Here is a general question and I apologize if it is a 
> little bit off 
> > topic (but I believe bioconductor must have some solution for that.)
> >
> > Is there a guideline or good tool to get "gene" expression profile 
> > from "probe" expression profile? In this process, I am 
> concerned that 
> > such tool or guide should address the issues of "multiple probes to 
> > one gene" and "one probe to multiple genes".
> >
> >   
> Don't deal with the first case.  Do all of your analyses at 
> the probe level.  There probably is not a safe, totally 
> general way to aggregate probes in a gene expression context. 
>  Instead, do you differential expression testing and then map 
> probes to genes for downstream processing (looking up in 
> Pubmed, etc). 
> 
> The second case can't be dealt with appropriately without 
> knowing why one probe maps to multiple genes.  In general, 
> unless you do your own annotation (using blast, for example), 
> it will be difficult to make a call in the general case.  
> However, in some cases, the answer is "obvious".  In the case 
> you emailed about earlier today (one probe hitting 3 genes), 
> it was fairly obvious what the answer was, since one of the 
> genes was a "Refseq" gene while the other two were simply 
> computationally predicted genes.  The most important point is 
> to know what sources of annotation are being used, what their 
> limitations are, and how they relate to other sources of 
> annotation--this knowledge is often not easy to come by, but 
> is invaluable.
> 
> > I believe it is a non-trivial process and automation  of 
> this process 
> > might not be easy:
> >   
> Automation really isn't possible, since there is not a 
> general solution to every case.  My rule of thumb is to 
> maintain as much information as possible throughout the 
> process of data analysis and then do some biologic knowledge 
> curation when the gene lists are in.  Unfortunately, there 
> really isn't a fantastic substitute for this last step.
> 
> Just my two-cents worth.
> 
> Sean
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>



More information about the Bioconductor mailing list