[BioC] probe expression profile to gene expression profile

Mon Apr 2 22:06:42 CEST 2007

Hi, there:

I think my first email was asking more about guidelines or generally
what people deal with probe2gene issue instead of for fully automation
(I mentioned "not easy").  But the discussion somehow becomes at what
stage we should do probe2gene or whether we should for some objectives
of study.

I agree in theory that analysis at probe level can keep info and avoid
early aggregation of info at gene level. However, at some point, you
still need to perform further analysis at gene or pathway level to
find the biological significance behind if your objective of study is.
Then the question is, is analysis like differential testing at probe
level safe then? (b/c some probes have been removed from this step,
for example). It is like "maximum pick" instead of "average pick".

Moreover, probes (mapped to one gene) are supposed to be highly
correlated. Highly correlated predictors are not desired in supervised
learning process, IMO.

Again, in theory, I agree to check manually instead of automatically
to make sure of each biological validity and the problem is more like
a biological one instead of bioinformatics one. However again :), in
practice, it might not be feasible for high-throughput technology,
which IMHO, allows some high-level noises or errors, but gives people
more statistical significance.

Just my2cents,

Weiwei

On 4/2/07, Christos Hatzis <christos at nuverabio.com> wrote:
> To add to Sean's comments, in general probe sets should be considered as
> independent entities (not necessarily as multiple/replicate measurements of
> the same entity, i.e. the underlying gene). So the question of which
> probeset-to-gene map should be used is rather ill posed.
>
> The answer will generally depend on the objective of the study.  For
> example, if the objective is to develop a predictive (classification) model,
> probe sets are the independent predictors and the question of gene-average
> expression is not really relevant.  As another example, if the objective is
> to compare the reproducibility of gene expression between two or more
> platforms, then it is imperative to match data at the probe set level to
> allow for a meaningful evaluation.  Different probe sets map to different
> parts of the gene and thus tend to behave independently, in many cases
> driven by allelic effects in the study population.
>
> Finally, if the objective is to understand the biology behind differentially
> expressed genes, then it is important to first double-check the validity of
> the "official" probe to gene mappings.  Then spend some time to try to
> understand the implications of the relative position of the probe set on the
> gene sequence.
>
> The following two articles are informative in this respect:
>
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=Ab
> stractPlus&list_uids=16284200&query_hl=15&itool=pubmed_docsum
>
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=Ab
> stractPlus&list_uids=17224057&query_hl=13&itool=pubmed_docsum
>
>
> So I would argue that this is more of a biology problem rather than a
> bioinformatics problem and thus not amenable to an automated solution.
>
> -Christos
>
> Christos Hatzis, Ph.D.
> Nuvera Biosciences, Inc.
> 400 West Cummings Park
> Suite 5350
> Woburn, MA 01801
> Tel: 781-938-3830
> www.nuverabio.com
>
>
>
> > -----Original Message-----
> > From: bioconductor-bounces at stat.math.ethz.ch
> > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of
> > Sean Davis
> > Sent: Monday, April 02, 2007 2:24 PM
> > To: Weiwei Shi
> > Cc: bioconductor
> > Subject: Re: [BioC] probe expression profile to gene
> > expression profile
> >
> > Weiwei Shi wrote:
> > > Dear All:
> > >
> > > Here is a general question and I apologize if it is a
> > little bit off
> > > topic (but I believe bioconductor must have some solution for that.)
> > >
> > > Is there a guideline or good tool to get "gene" expression profile
> > > from "probe" expression profile? In this process, I am
> > concerned that
> > > such tool or guide should address the issues of "multiple probes to
> > > one gene" and "one probe to multiple genes".
> > >
> > >
> > Don't deal with the first case.  Do all of your analyses at
> > the probe level.  There probably is not a safe, totally
> > general way to aggregate probes in a gene expression context.
> >  Instead, do you differential expression testing and then map
> > probes to genes for downstream processing (looking up in
> > Pubmed, etc).
> >
> > The second case can't be dealt with appropriately without
> > knowing why one probe maps to multiple genes.  In general,
> > unless you do your own annotation (using blast, for example),
> > it will be difficult to make a call in the general case.
> > However, in some cases, the answer is "obvious".  In the case
> > you emailed about earlier today (one probe hitting 3 genes),
> > it was fairly obvious what the answer was, since one of the
> > genes was a "Refseq" gene while the other two were simply
> > computationally predicted genes.  The most important point is
> > to know what sources of annotation are being used, what their
> > limitations are, and how they relate to other sources of
> > annotation--this knowledge is often not easy to come by, but
> > is invaluable.
> >
> > > I believe it is a non-trivial process and automation  of
> > this process
> > > might not be easy:
> > >
> > Automation really isn't possible, since there is not a
> > general solution to every case.  My rule of thumb is to
> > maintain as much information as possible throughout the
> > process of data analysis and then do some biologic knowledge
> > curation when the gene lists are in.  Unfortunately, there
> > really isn't a fantastic substitute for this last step.
> >
> > Just my two-cents worth.
> >
> > Sean
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
>
>
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III