[BioC] how to normalize by columns
David Kipling
KiplingD at cardiff.ac.uk
Thu May 26 09:33:39 CEST 2005
On 26 May 2005, at 07:12, diego huck wrote:
>
> Hello
>
> I am a beginner at bioconductor and R. I have a confussion about how
> to do a normalization which consist of obtain the mean of a column,
> and then substract the mean of the column to each value in the column.
> x1(1)- mean(col x1) x2(1)- mean(col x2)
> x1(2)- mean(col x1) x2(2)- mean(col x2)
> x1(3)- mean(col x1) x2(3)- mean(col x2)
> .................... ...................
>
>
> I have the genes in columns and the conditions in rows.
That is fine, although unusual. Be aware that many of the BioC (and
similar) microarray packages use a rows=genes, columns=samples
convention. Although this perhaps wouldn't be the way a statistician
would arrange subjects and measurements in a table in R, I think it is
partly a historical carry-over from microarray data analysis in
spreadsheets and the like. Excel has a 256 column x 65000(ish) row
size limit, so you are pretty much stuck with one layout!
If you ever need to rotate your data then this is easy: use the t()
function.
newArray <- t(oldArray)
> I don't want to stabilize the variance.
If you did, the vsn package will do this.
> As you can see is a very simple calculation.
> I am wondering if could use packages like vsn or affy to do that or
> is more easy to write a script.
You can do this yourself very easy, as this code snippet shows:
# Make a spoof array of 100 genes and 20 samples to demonstrate
x <- matrix(runif(2000), ncol=100)
# Calculate the mean of each column. Note: you could us median here
to make it slightly more robust
colMeans <- apply(x, 2, mean)
# Subtrate the column means from each value in that column
x <- sweep(x, 2, colMeans, "-")
# You can do a similar version to subtrate the row means; simply
change the second value of both apply() and sweep() to "1".
# Alternatively, if you wanted to do division as opposed to subtraction
use
x <- sweep(x, 2, colMeans, "/")
> Futhermore, I have a doubt if such simple normalization is
> conceptually correct whith the objetive of eliminate the effect
> between array.
> I would to know if I have to iterate any numbers of times the process
> o f calculate the mean of each column and substract the mean.
>
Subtracting the mean from each column will make the new mean of each
column zero, so one cycle is enough.
Hope this helps.
David
Prof David Kipling
Department of Pathology
School of Medicine
Cardiff University
Heath Park
Cardiff CF14 4XN
Tel: 029 2074 4847
Email: KiplingD at cardiff.ac.uk
More information about the Bioconductor
mailing list