[R] sapply puzzlement

Fri Jan 28 02:05:17 CET 2011

On Jan 27, 2011, at 7:16 PM, Ernest Adrogué i Calveras wrote:

> Hi,
>
> I have this data.frame with two variables in it,
>
>> z
>  V1 V2
> 1 10  8
> 2 NA 18
> 3  9  7
> 4  3 NA
> 5 NA 10
> 6 11 12
> 7 13  9
> 8 12 11
>
> and a vector of means,
>
>> means <- apply(z, 2, function (col) mean(na.omit(col)))
>> means
>       V1        V2
> 9.666667 10.714286

Two methods:

A) use sweep  (which by default takes the difference)

 > sweep(z, 2, means)
           V1         V2
1  0.3333333 -2.7142857
2         NA  7.2857143
3 -0.6666667 -3.7142857
4 -6.6666667         NA
5         NA -0.7142857
6  1.3333333  1.2857143
7  3.3333333 -1.7142857
8  2.3333333  0.2857143

B) use the scale function (whose "whole purpose in life" is to  
subtract the mean and possibly divide by the standard deviation which  
we suppressed in this case with the scale=FALSE argument)

 > scale(z, scale=FALSE)
           V1         V2
1  0.3333333 -2.7142857
2         NA  7.2857143
3 -0.6666667 -3.7142857
4 -6.6666667         NA
5         NA -0.7142857
6  1.3333333  1.2857143
7  3.3333333 -1.7142857
8  2.3333333  0.2857143
attr(,"scaled:center")
        V1        V2
  9.666667 10.714286

-- 
David.

>
> My intention was substracting means from z, so instictively I tried
>
>> z-means
>          V1         V2
> 1  0.3333333 -1.6666667
> 2         NA  7.2857143
> 3 -0.6666667 -2.6666667
> 4 -7.7142857         NA
> 5         NA  0.3333333
> 6  0.2857143  1.2857143
> 7  3.3333333 -0.6666667
> 8  1.2857143  0.2857143
>
> But this is completely wrong. sapply() gives the same result:
>
>> sapply(z, function(row) row - means)
>             V1         V2
> [1,]  0.3333333 -1.6666667
> [2,]         NA  7.2857143
> [3,] -0.6666667 -2.6666667
> [4,] -7.7142857         NA
> [5,]         NA  0.3333333
> [6,]  0.2857143  1.2857143
> [7,]  3.3333333 -0.6666667
> [8,]  1.2857143  0.2857143
>
> So, what is going on here?
> The following appears to work
>
>> z-matrix(means,ncol=2)[rep(1, dim(z)[1]),]
>          V1         V2
> 1  0.3333333 -2.7142857
> 2         NA  7.2857143
> 3 -0.6666667 -3.7142857
> 4 -6.6666667         NA
> 5         NA -0.7142857
> 6  1.3333333  1.2857143
> 7  3.3333333 -1.7142857
> 8  2.3333333  0.2857143
>
> but I think it's rather cumbersome, surely there must be a cleaner way
> to do it.
>
> -- 
> Ernest
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT