[BioC] Separate Normalizations and expression plotting
Naomi Altman
naomi at stat.psu.edu
Fri Oct 6 17:25:48 CEST 2006
It is important to remember that statistical significance refers to
the investigator's ability to reproduce the result. A result can be
statistically significant without having biological
significance. The reported R-sq might be statistically significant
but biologically insignificant.
--Naomi
At 06:57 AM 10/6/2006, Sean Davis wrote:
>On Thursday 05 October 2006 21:14, Lana Schaffer wrote:
> > Hi,
> > This experiment involves the expression analysis between 2 batches of
> > AGE variable samples which were normalized separately because the
> > batches did not cluster together. The age groups of the second set
> > were in between the ages from the first set and all the data are desired
> > to be analyzed together. Now what happened is that the separately
> > normalized expression values became graphed together by the lab researchers
> > and plotted expression values vs age. Now with the diseased samples there
> > were genes which showed a "trend" with age where the R-squared were between
> > .16 and .4. From my training I get that this trend only is explains 16-40%
> > of the data and would not be significant. However, using Prism these
> > R-squares are called significantly different from zero. These researchers
> > explain to me that this is the way data is presented in their field and
> > that an R-squared of .16-.4 is considered excellent results. Indeed, with
> > non-diseased individuals the R-squared are zero for these genes. I
> > understand that in their field "any" trend is better than no trend,
> > especially since the samples are hetergeneous. However, this is not what
> > is taught in statistics. These graphs will be submitted to Journals under
> > my authorship and I am a bit shaken-up.
> > Would you please comment to me about your thoughs about the combination
> > of the 2 sets of expression values and the significance of the R-squared
> > values. Thanks,
> > Lana
>
>Lana,
>
>There are two issues it seems. The first is normalization of two separate
>batches, which is appropriate. What isn't clear is whether doing so
>introduces bias in downstream analyses--this you will need to judge for
>yourself.
>
>The second is of the significance of the results of the R-squared value. To
>convince yourself and your collaborators of the significance or lack thereof
>of the computed values, one can test whether the R-squared is significant or
>not. I would suggest using a permutation-based analysis, but the method is
>up to you. Judging an R-squared value by looking at the raw number is
>probably not a valid method for determining significance.
>
>Sean
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor
mailing list