[BioC] how to normalize by columns
diego huck
diegolugro at yahoo.com.ar
Fri May 27 19:06:41 CEST 2005
Thank you David, this commands were very useful.
Thank you Gordon for your comments, I´ll go to see the again the
statistics theory.
best regards
diego
David Kipling wrote:
>
>
> On 26 May 2005, at 07:12, diego huck wrote:
>
>>
>> Hello
>>
>> I am a beginner at bioconductor and R. I have a confussion about how
>> to do a normalization which consist of obtain the mean of a column,
>> and then substract the mean of the column to each value in the column.
>> x1(1)- mean(col x1) x2(1)- mean(col x2)
>> x1(2)- mean(col x1) x2(2)- mean(col x2)
>> x1(3)- mean(col x1) x2(3)- mean(col x2)
>> .................... ...................
>>
>>
>> I have the genes in columns and the conditions in rows.
>
>
> That is fine, although unusual. Be aware that many of the BioC (and
> similar) microarray packages use a rows=genes, columns=samples
> convention. Although this perhaps wouldn't be the way a statistician
> would arrange subjects and measurements in a table in R, I think it is
> partly a historical carry-over from microarray data analysis in
> spreadsheets and the like. Excel has a 256 column x 65000(ish) row size
> limit, so you are pretty much stuck with one layout!
>
> If you ever need to rotate your data then this is easy: use the t()
> function.
>
> newArray <- t(oldArray)
>
>
>> I don't want to stabilize the variance.
>
>
> If you did, the vsn package will do this.
>
>> As you can see is a very simple calculation.
>> I am wondering if could use packages like vsn or affy to do that or
>> is more easy to write a script.
>
>
> You can do this yourself very easy, as this code snippet shows:
>
>
> # Make a spoof array of 100 genes and 20 samples to demonstrate
> x <- matrix(runif(2000), ncol=100)
>
> # Calculate the mean of each column. Note: you could us median here
> to make it slightly more robust
> colMeans <- apply(x, 2, mean)
>
> # Subtrate the column means from each value in that column
> x <- sweep(x, 2, colMeans, "-")
>
> # You can do a similar version to subtrate the row means; simply
> change the second value of both apply() and sweep() to "1".
> # Alternatively, if you wanted to do division as opposed to
> subtraction use
> x <- sweep(x, 2, colMeans, "/")
>
>
>> Futhermore, I have a doubt if such simple normalization is
>> conceptually correct whith the objetive of eliminate the effect
>> between array.
>> I would to know if I have to iterate any numbers of times the process
>> o f calculate the mean of each column and substract the mean.
>>
>
> Subtracting the mean from each column will make the new mean of each
> column zero, so one cycle is enough.
>
> Hope this helps.
>
> David
>
> Prof David Kipling
> Department of Pathology
> School of Medicine
> Cardiff University
> Heath Park
> Cardiff CF14 4XN
>
> Tel: 029 2074 4847
> Email: KiplingD at cardiff.ac.uk
>
>
More information about the Bioconductor
mailing list