[R] standardize columns selectively within a dataframe
Olga Lyashevska
olga at herenstraat.nl
Wed Sep 1 18:59:40 CEST 2010
On Wed, 2010-09-01 at 12:42 -0400, David Winsemius wrote:
> I suspect you might have tried (df-mean(df))/sd(x) and gotten
> unsatisfactory results; I know I did.
yes, indeed! a few times, but why is that?
> If you had really wanted to
> persist and do it from first principles, so to speak, or perhaps as
> "homework", then consider the sweep operation. It takes an object of
> lower dimension and applies a function, ("-") by default, with the
> third argument repeatedly across the specified (in the second
> argument) dimension. You wanted to work on columns, so this would
> accomplish the subtraction of means() followed by division by sd():
>
> > sweep(as.matrix(df[ , 1:2]), 2L, colMeans(mm)) # using the default
> "-" operator
> a b
> [1,] -1 -1
> [2,] 0 0
> [3,] 1 1
> > sweep(sweep(df[ , 1:2], 2L, colMeans(mm)), 2, sd(mm), "/")
> a b
> 1 -1 -1
> 2 0 0
> 3 1 1
I am glad you are talking about sweep here, I have been also trying to
use it, but never managed to get complete understanding of what it
exactly does and therefore I could not get it working properly. Very
clear explanation, thanks!
> (Your test columns happened to be scaled already and only needed to be
> centered. This is how scale() does its work, and their help pages have
> links cross-referencing each other.)
>
> This is probably a good time to reference Burns', The R Inferno, which
> has an entry for sweep (p 57) as well tips regarding the drop=FALSE
> maneuver (p 54) that I tried first for this problem but it "didn't
> work".
Thanks for the references! Your solution with scale() is nice and neat,
but for the sake of learning it is useful to persist.
Cheers,
Olga
More information about the R-help
mailing list