[BioC] ttest or fold change
michael watson (IAH-C)
michael.watson at bbsrc.ac.uk
Mon Dec 15 13:11:43 MET 2003
Why not try the non-parametric t-tests available?
I know all the arguments about a "loss of power" etc, but at the end of day, as statisticians and bioinformaticians, sometimes biologists come to us with small numbers of replicates (for very understandable reasons) and it is our job to get some meaning out of that data. Trying to fit any kind of statistic involving a p-value to such data is a difficult and risky task, and trying to explain those results to the biologist is often very difficult.
So here's what happens with the non-parametric tests based on ranking. Those genes with the highest |t| are those where all the replicates of one condition are greater than all the replicates of the other condition. The next highest |t| is where all but one of the replicates of one condition are greater than all the replicates of the other conddition, etc etc.
OK, so some of these differences could occur by chance, but we're dealing with often millions of data points and I really don't think it's possible to make no mistakes. And curse me if you like, but if i have a gene expression measurement, replicated 5 times in two conditions, and in one condition all five replicates are higher than the five replicates of the other condition, then I believe that that gene is differentially expressed. And thats easy to find with non-parametric t, and it is easy to explain to a biologist, and at the end of the day, is it really wrong to do that?
-----Original Message-----
From: Ramon Diaz-Uriarte [mailto:rdiaz at cnio.es]
Sent: 15 December 2003 11:54
To: Jason Hipp; bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] ttest or fold change
Dear Jason,
First, I think you should recognize that three replicates are very few and
thus conclusions will not be particularly trustworthy. I assume this is a
first round of screening for relevant genes for subsequent studies. Second, I
think the fold-ratio vs. t-test issue can often muddle two different
questions: a) is there statistical evidence of differential expression; b) is
the expression of gene X altered in a biologically relevant way (where
biologically relevant means more than Z times). If you had a large number of
samples you might be able to detect as "statistically significant" very small
log ratio changes (which might, or might not, be biologically relevant);
converseley, what if the fold change is large but the variance is huge? For
reasons I don't understand, the two-fold change sometimes has a sacrosant
status, but it is my understanding that other fold changes (say 1.3 or 3.5)
could, on certain cases, be much more biologically relevant; this, of course,
depends on the context.
In your case, the t-test has an additional potential problem with the
denominator. I would suggest using some procedure, such as the empirical
bayes one in limma, that will use a modificied expression for the
denominator, and save you from finding some very small p-values just because
that gene has, by chance, an artificially small variance.
So I would use limma (or something like it) and also filter by some criterion
that biologists tell you is relevant for them (say, we only want genes that
are overexpressed at least 5 times, or whatever).
Best,
R.
On Saturday 13 December 2003 16:45, Jason Hipp wrote:
> I am comparinga relatively homogeneous cell culture to another that has
> been treated, and am using RMA.
>
> I only have 3 replicates of each. Would you recommend a 2 tailed equal
> variance t test? I also thought I read that with such few replicates, a
> fold change would be better than a t test? If I get a t test of .0001, and
> a fold change of 1.2, is this a reliable change using RMA?
>
> Thanks,
> Jason
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
--
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
More information about the Bioconductor
mailing list