[Rd] scale(x, center=FALSE) (PR#14219)
Ben Bolker
bolker at ufl.edu
Fri Mar 12 19:29:44 CET 2010
I'm resending this after a week ... I really don't want to nag, but
I also would not like to see this sink below the waves.
Is there a preferred protocol for requesting comments without nagging
too much? I would add a comment to 14219 (and was curious to see
whether it was rejected) ... I went to bugzilla, and bug 14219 doesn't
seem to exist any more  either as open or as closed  don't know if
it got lost, or thrown away, when the bug system migrated?
[re: behavior of scale() when center=FALSE and scale=TRUE]
> Again, I agree with you that the behavior is not optimal, but it is
> very hard to make changes in R when the behavior is suboptimal rather
> than actually wrong (by some definition). Rcore is very conservative
> about changes that break backward compatibility; I would like it if they
> chose to change the function to use standard deviation rather than
> rootmeansquare, but I doubt it will happen (and it would break things
> for any users who are relying on the current definition).
[snip]
> I have attached a patch
> file (and append the information below as well) that changes "standard
> deviation" back to "root mean square" and is much more explicit about
> this issue ... I hope Rcore will jump in, critique it, and possibly use
> it in some form to improve (?) the documentation ...
>
> [PS: I have written that the scaling is equivalent to sd() "if and
> only if" centering was done. Technically it would also be equivalent if
> the column already had zero mean ...]
>
===================================================================
 scale.Rd (revision 51180)
+++ scale.Rd (working copy)
@@ 41,13 +41,18 @@
equal to the number of columns of \code{x}, then each column of
\code{x} is divided by the corresponding value from \code{scale}. If
\code{scale} is \code{TRUE} then scaling is done by dividing the
 (centered) columns of \code{x} by their standard deviations, and if
+ (centered) columns of \code{x} by their rootmeansquares, and if
\code{scale} is \code{FALSE}, no scaling is done.

 The standard deviation for a column is obtained by computing the
 squareroot of the sumofsquares of the nonmissing values in the
 column divided by the number of nonmissing values minus one (whether
 or not centering was done).
+
+ The rootmeansquare for a (possibly centered)
+ column is defined as
+ \eqn{\sqrt{\sum(x^2)/(n1)}}{sqrt(sum(x^2)/(n1))},
+ where \eqn{x} is a vector of the nonmissing values
+ and \eqn{n} is the number of nonmissing values.
+ If (and only if) centering was done,
+ this is equivalent to \code{sd(x,na.rm=TRUE)}.
+ (To scale by the standard deviations without centering,
+ use \code{scale(x,center=FALSE,scale=apply(x,2,sd,na.rm=TRUE))}.)
}
\references{
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
(Bump re: suggested update to scale.Rd . Is this under
consideration? I'll stop pestering if it's considered
unacceptable, just don't want it to vanish without a trace ...)

