[BioC] Separate Normalizations and expression plotting

Fri Oct 6 12:57:25 CEST 2006

On Thursday 05 October 2006 21:14, Lana Schaffer wrote:
> Hi,
> This experiment involves the expression analysis between 2 batches of
> AGE variable samples which were normalized separately because the
> batches did not cluster together.  The age groups of the second set
> were in between the ages from the first set and all the data are desired
> to be analyzed together.  Now what happened is that the separately
> normalized expression values became graphed together by the lab researchers
> and plotted expression values vs age.  Now with the diseased samples there
> were genes which showed a "trend" with age where the R-squared were between
> .16 and .4.  From my training I get that this trend only is explains 16-40%
> of the data and would not be significant.  However, using Prism these
> R-squares are called significantly different from zero.  These researchers
> explain to me that this is the way data is presented in their field and
> that an R-squared of .16-.4 is considered excellent results.  Indeed, with
> non-diseased individuals the R-squared are zero for these genes. I
> understand that in their field "any" trend is better than no trend,
> especially since the samples are hetergeneous.  However, this is not what
> is taught in statistics. These graphs will be submitted to Journals under
> my authorship and I am a bit shaken-up.
> Would you please comment to me about your thoughs about the combination
> of the 2 sets of expression values and the significance of the R-squared
> values. Thanks,
> Lana

Lana,

There are two issues it seems.  The first is normalization of two separate 
batches, which is appropriate.  What isn't clear is whether doing so 
introduces bias in downstream analyses--this you will need to judge for 
yourself.  

The second is of the significance of the results of the R-squared value.  To 
convince yourself and your collaborators of the significance or lack thereof 
of the computed values, one can test whether the R-squared is significant or 
not.  I would suggest using a permutation-based analysis, but the method is 
up to you.  Judging an R-squared value by looking at the raw number is 
probably not a valid method for determining significance.

Sean