[R] log2() and -min() very quick question

jim holtman jholtman at gmail.com
Mon Jun 13 18:08:44 CEST 2011


The second line is just scaling the data based on log2.  It is
subtracting the minimun of the entire matrix (not just each row) and
adding 1 to make sure there is not a value of zero since log2(0) is
not valid.  Here is an example of sample data:

> x <- matrix(runif(25, -50, 50), 5)
> x
          [,1]      [,2]       [,3]       [,4]       [,5]
[1,] 29.730883  15.47239 -28.679186  47.617069 -48.692242
[2,] -4.472555 -14.68027 -37.062765  23.179251  21.556607
[3,] -8.991592 -22.97399  -2.188197 -14.327309 -39.681576
[4,] 31.087024  49.26841  42.407447  -6.852631  -5.371565
[5,] 10.493329  13.34933   9.876097 -35.178844  14.010105
> # scale to log2
> x <- log2(x - min(x) + 1)
> x
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 6.311487 6.026017 4.393214 6.604506 0.000000
[2,] 5.498879 5.129776 3.658723 6.187283 6.154795
[3,] 5.346980 4.739754 5.569978 5.144248 3.323466
[4,] 6.335913 6.628783 6.525124 5.420873 5.469908
[5,] 5.911346 5.978232 5.896474 3.859313 5.993275

You should see a noticable change between the data read in and the
result of the second statement.

On Mon, Jun 13, 2011 at 11:59 AM, Ben Ganzfried <ben.ganzfried at gmail.com> wrote:
> I'm looking over good-code a post-doc in my lab wrote and trying to learn
> how it works.  I came across the following:
> rel.abundance <- as.matrix(read.delim("rel.abundance.csv",row.names=1,as.is
> =TRUE))
> rel.abundance <- log2(rel.abundance-min(rel.abundance)+1)
>
> I'm not sure what the second line is doing.  I ran each line in R and
> couldn't see a noticeable difference in the output.  I assume log2() takes
> the log base 2 of the values?  I'm not clear what -min(rel.abundance) is
> doing either...my hunch would be that it would take the smallest value in
> each row?
> I'd really like to figure out:
> 1) What's actually going on?
> 2) Is there a good way to run a command over a large dataset in R and better
> be able to tell what is going on?  More specifically, when I run each line
> in R it looks something like this (w/ dif. values per row):
> Archaea|Euryarchaeota|Methanobacteria|Methanobacteriales|Methanobacteriaceae|Methanobrevibacter,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,23,0,3,0,0,0
>
>
> There are a lot of cells w/ values per row, which is one reason why I think
> it is difficult to detect a pattern....
>
> Thanks in advance!
>
> Ben
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list