# [R] How to normalize to a set of internal references

Waverley waverley.paloalto at gmail.com
Mon Mar 2 05:24:29 CET 2009

```Thanks for the advice.  My question is more on how to do this?

Let me use a biology gene analysis example to illustrate:
In biology, there are always some house keeping genes which differ
little even at pathological conditions.

We know that at different batches, there are external factors affect
the measurements.  For example, overall signal intensity might be
different due to lab reagents.
A simplified picture:
Day 1:  Using control samples, I have measured #1 to #110 genes and get data.
Day 2: Using disease samples, I have measured again #1 to #110 genes
and get data.

For those two data sets, I noticed the overall signal intensity in Day
1, for each gene, is more than Day 2.
I know, from biological literature,  gene 101 to 110, are "house
keeping" genes, should not change much between disease and control.
My questions arise, technically, how do I use gene 101 to 110 values
to adjust the signals of gene 1 to 100 such that the batch effect can
be corrected.  The differences revealing from the comparative analysis
of 1 ~ 100 genes between disease and control are due to biology rather
than lab artifacts.

So the question is how to do that mathematically? If I have only one
house keeping gene, then I can divide every gene to that to normalize,
then compare.  But now I have 10 genes which can be utilized for
normalization.  I assume, the more reference genes to be  used, the
better, under this context.

Can you help again?

Waverley wrote:
> Hi,
>
> I have a question of the method as how to normalize the data sets
> according to a set of the internal measurements.
>
> For example, I have performed two batches of experiments contrasting
> two different conditions (positive versus negative conditions): one at
> a time.
>
> 1. each experiment, I measure signals of variable v1 to v100. I want
> to understand v1 to v100 change under these two contrasting conditions
>
> 2. Also I know different variables v101 to v1110, total of 10 of them,
> although they are different from each other, but they would of the
> same or similar values under these two contrasting conditions
>
> 3. How do I do the internal normalization?  How can I use the the
> variable v101 to v110 values to normalize the measures of v1 to v100
> at either positive or negative condition to minimize batch effect?  I
> hope the comparisons of values (v1 to v100) between two different
> conditions can be more accurate and robust to external noises.
>
> In general, I have a couple of matrices of the same dimensions and a
> reference matrix of values to be used as reference values to be
> normalize to.  How should I do that?
>

I don't understand your problem well, but in general internal
normalization is by and large an attempt to avoid appropriate modeling
(e.g., incorporating block effects or certain covariates in a regression
model), and results in overstated confidence of the final estimates by
not taking into account the imprecision in the normalizing factors.

Frank
--
Frank E Harrell Jr   Professor and Chair           School of Medicine
Department of Biostatistics   Vanderbilt University

--
Waverley @ Palo Alto

```