[BioC] Separate Normalizations and expression plotting

Naomi Altman naomi at stat.psu.edu
Fri Oct 6 17:25:48 CEST 2006


It is important to remember that statistical significance refers to 
the investigator's ability to reproduce the result.  A result can be 
statistically significant without having biological 
significance.  The reported R-sq might be statistically significant 
but biologically insignificant.

--Naomi

At 06:57 AM 10/6/2006, Sean Davis wrote:
>On Thursday 05 October 2006 21:14, Lana Schaffer wrote:
> > Hi,
> > This experiment involves the expression analysis between 2 batches of
> > AGE variable samples which were normalized separately because the
> > batches did not cluster together.  The age groups of the second set
> > were in between the ages from the first set and all the data are desired
> > to be analyzed together.  Now what happened is that the separately
> > normalized expression values became graphed together by the lab researchers
> > and plotted expression values vs age.  Now with the diseased samples there
> > were genes which showed a "trend" with age where the R-squared were between
> > .16 and .4.  From my training I get that this trend only is explains 16-40%
> > of the data and would not be significant.  However, using Prism these
> > R-squares are called significantly different from zero.  These researchers
> > explain to me that this is the way data is presented in their field and
> > that an R-squared of .16-.4 is considered excellent results.  Indeed, with
> > non-diseased individuals the R-squared are zero for these genes. I
> > understand that in their field "any" trend is better than no trend,
> > especially since the samples are hetergeneous.  However, this is not what
> > is taught in statistics. These graphs will be submitted to Journals under
> > my authorship and I am a bit shaken-up.
> > Would you please comment to me about your thoughs about the combination
> > of the 2 sets of expression values and the significance of the R-squared
> > values. Thanks,
> > Lana
>
>Lana,
>
>There are two issues it seems.  The first is normalization of two separate
>batches, which is appropriate.  What isn't clear is whether doing so
>introduces bias in downstream analyses--this you will need to judge for
>yourself.
>
>The second is of the significance of the results of the R-squared value.  To
>convince yourself and your collaborators of the significance or lack thereof
>of the computed values, one can test whether the R-squared is significant or
>not.  I would suggest using a permutation-based analysis, but the method is
>up to you.  Judging an R-squared value by looking at the raw number is
>probably not a valid method for determining significance.
>
>Sean
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list