[Rd] scale(x, center=FALSE) (PR#14219)

Peter Dalgaard pdalgd at gmail.com
Sat Mar 13 09:51:36 CET 2010

Ben Bolker wrote:
>   Thanks Simon!
>   How irritating/wrong would it be if I opened a new bug to submit my
> suggested documentation patch?  As detailed below, I think the
> documentation is somewhat confusing (it depends on a highly non-standard
> definition of "standard deviation" ...)

Hum, yes. And the 2.9.x version had a strange definition of root mean
square, except in the case where it actually meant the SD...

It is unattractive to change the actual code in case someone is relying
on current behaviour, but if we could, I'd replase (n-1) with n in the
non-centered case, or maybe introduce a "df=if(center)n-1 else n" type
argument. (The original report wanted us to use the SD even in that
case, which is not obviously desirable, and probably the reason that the
whole report got binned.)

Anyways, by all means, just submit a fresh report.

>   cheers
>     Ben Bolker
> Simon Urbanek wrote:
>> On Mar 12, 2010, at 1:29 PM, Ben Bolker wrote:
>>>  I'm resending this after a week ... I really don't want to nag, but
>>> I also would not like to see this sink below the waves.
>> It has been closed as feature/FAQ with the note:
>> "As documented on the help page!"
>>>  Is there a preferred protocol for requesting comments without nagging too much?   I would add a comment to 14219 (and was curious to see whether it was rejected) ... I went to bugzilla, and bug 14219 doesn't seem to exist any more -- either as open or as closed -- don't know if it got lost, or thrown away, when the bug system migrated?
>> Hmm.. there was apparently an error when importing the feature&FAQ box. Unfortunately Jitterbug left some duplicate bugs in different categories so the import was not as easy as it should be. I'll double check the IDs to see if any others are missing -- I ran import for 14219 manually now.
>> Thanks,
>> Simon
>>> [re: behavior of scale() when center=FALSE and scale=TRUE]
>>>>  Again, I agree with you that the behavior is not optimal, but it is
>>>> very hard to make changes in R when the behavior is sub-optimal rather
>>>> than actually wrong (by some definition).  R-core is very conservative
>>>> about changes that break backward compatibility; I would like it if they
>>>> chose to change the function to use standard deviation rather than
>>>> root-mean-square, but I doubt it will happen (and it would break things
>>>> for any users who are relying on the current definition).
>>> [snip]
>>>> I have attached a patch
>>>> file (and append the information below as well) that changes "standard
>>>> deviation" back to "root mean square" and is much more explicit about
>>>> this issue ... I hope R-core will jump in, critique it, and possibly use
>>>> it in some form to improve (?) the documentation ...
>>>>  [PS: I have written that the scaling is equivalent to sd() "if and
>>>> only if" centering was done.  Technically it would also be equivalent if
>>>> the column already had zero mean ...]
>>> ===================================================================
>>> --- scale.Rd	(revision 51180)
>>> +++ scale.Rd	(working copy)
>>> @@ -41,13 +41,18 @@
>>>   equal to the number of columns of \code{x}, then each column of
>>>   \code{x} is divided by the corresponding value from \code{scale}.  If
>>>   \code{scale} is \code{TRUE} then scaling is done by dividing the
>>> -  (centered) columns of \code{x} by their standard deviations, and if
>>> +  (centered) columns of \code{x} by their root-mean-squares, and if
>>>   \code{scale} is \code{FALSE}, no scaling is done.
>>> -
>>> -  The standard deviation for a column is obtained by computing the
>>> -  square-root of the sum-of-squares of the non-missing values in the
>>> -  column divided by the number of non-missing values minus one (whether
>>> -  or not centering was done).
>>> +
>>> +  The root-mean-square for a (possibly centered)
>>> +  column is defined as
>>> +  \eqn{\sqrt{\sum(x^2)/(n-1)}}{sqrt(sum(x^2)/(n-1))},
>>> +  where \eqn{x} is a vector of the non-missing values
>>> +  and \eqn{n} is the number of non-missing values.
>>> +  If (and only if) centering was done,
>>> +  this is equivalent to \code{sd(x,na.rm=TRUE)}.
>>> +  (To scale by the standard deviations without centering,
>>> +  use \code{scale(x,center=FALSE,scale=apply(x,2,sd,na.rm=TRUE))}.)
>>> }
>>> \references{
>>>   Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
>>> (Bump re: suggested update to scale.Rd .  Is this under
>>> consideration? I'll stop pestering if it's considered
>>> unacceptable, just don't want it to vanish without a trace ...)
>>> -- 
>>> Ben Bolker
>>> Associate professor, Biology Dep't, Univ. of Florida
>>> bolker at ufl.edu / people.biology.ufl.edu/bolker
>>> GPG key: people.biology.ufl.edu/bolker/benbolker-publickey.asc
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ------------------------------------------------------------------------
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

More information about the R-devel mailing list