[BioC] probeset for a single gene

Francois Pepin fpepin at cs.mcgill.ca
Fri Nov 10 19:22:12 CET 2006


Hi Weiwei,

I'm pretty sure there's been some discussion on this not too long ago,
but I can't recall off the top of my mind what the subject line was.

The standard answer is that it depends on why you might have different
probes for the gene and what you would expect from them.

In many cases, there are several probes because they give different
results (else they wouldn't waste the space). The canonical example for
this would a splice variant or using an alternative poly-A site.
Depending on your amplification protocol, you might also be more
sensitive to the distance of the probe from the poly-A site as well.

If you have reason to believe that all probes should give the same
result then using the average or median would make sense. This happens
if you have the exact same probe on different places on the array.

Otherwise, you might want to take the most interesting probe and say it
represents the whole gene. How you define the most interesting probe can
vary. You can use the interquartile range or it could be the one giving
you the most differential expression. The most interesting probe might
change from an experiment to the next (if we're talking about splice
variants for example).

Another option is to keep them all around. I tend to prefer this option
if I'm not running statistical tests that depend on having a single
measurement per gene (GO and pathway analyses are the main example that
come to mind). That whichever probe is works well will come up and if
several of them show up, you can believe that result some more.

As Sean mentioned there is an extensive literature on those subject.

Francois

On Fri, 2006-11-10 at 13:01 -0500, Weiwei Shi wrote:
> Hi,
> I went through the archive for a while and still did not find the good
> answer for that. Sorry for the re-post :(
> 
> suppose i have some probes for the same gene, I am wondering which is
> the proper way to get a statistic for the expression for this gene?
> using mean, median or max or min? I think it might be affected by the
> research target but I wondering if there is some ref on it.
> 
> btw, is there some ref on the data pre-processing (gene selection,
> multiple comparison, better with case study) for microarray analysis
> other than bioconductor book?
> 
> thanks
>



More information about the Bioconductor mailing list