[R] log2() and -min() very quick question
jim holtman
jholtman at gmail.com
Mon Jun 13 18:08:44 CEST 2011
The second line is just scaling the data based on log2. It is
subtracting the minimun of the entire matrix (not just each row) and
adding 1 to make sure there is not a value of zero since log2(0) is
not valid. Here is an example of sample data:
> x <- matrix(runif(25, -50, 50), 5)
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 29.730883 15.47239 -28.679186 47.617069 -48.692242
[2,] -4.472555 -14.68027 -37.062765 23.179251 21.556607
[3,] -8.991592 -22.97399 -2.188197 -14.327309 -39.681576
[4,] 31.087024 49.26841 42.407447 -6.852631 -5.371565
[5,] 10.493329 13.34933 9.876097 -35.178844 14.010105
> # scale to log2
> x <- log2(x - min(x) + 1)
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 6.311487 6.026017 4.393214 6.604506 0.000000
[2,] 5.498879 5.129776 3.658723 6.187283 6.154795
[3,] 5.346980 4.739754 5.569978 5.144248 3.323466
[4,] 6.335913 6.628783 6.525124 5.420873 5.469908
[5,] 5.911346 5.978232 5.896474 3.859313 5.993275
You should see a noticable change between the data read in and the
result of the second statement.
On Mon, Jun 13, 2011 at 11:59 AM, Ben Ganzfried <ben.ganzfried at gmail.com> wrote:
> I'm looking over good-code a post-doc in my lab wrote and trying to learn
> how it works. I came across the following:
> rel.abundance <- as.matrix(read.delim("rel.abundance.csv",row.names=1,as.is
> =TRUE))
> rel.abundance <- log2(rel.abundance-min(rel.abundance)+1)
>
> I'm not sure what the second line is doing. I ran each line in R and
> couldn't see a noticeable difference in the output. I assume log2() takes
> the log base 2 of the values? I'm not clear what -min(rel.abundance) is
> doing either...my hunch would be that it would take the smallest value in
> each row?
> I'd really like to figure out:
> 1) What's actually going on?
> 2) Is there a good way to run a command over a large dataset in R and better
> be able to tell what is going on? More specifically, when I run each line
> in R it looks something like this (w/ dif. values per row):
> Archaea|Euryarchaeota|Methanobacteria|Methanobacteriales|Methanobacteriaceae|Methanobrevibacter,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,23,0,3,0,0,0
>
>
> There are a lot of cells w/ values per row, which is one reason why I think
> it is difficult to detect a pattern....
>
> Thanks in advance!
>
> Ben
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
More information about the R-help
mailing list