[R] scaling and optim

Fri Feb 8 02:49:18 CET 2008

?optim says, in describing the control parameter,
     'fnscale' An overall scaling to be applied to the value of 'fn'
          and 'gr' during optimization. If negative, turns the problem
          into a maximization problem. Optimization is performed on
          'fn(par)/fnscale'.

     'parscale' A vector of scaling values for the parameters.
          Optimization is performed on 'par/parscale' and these should
          be comparable in the sense that a unit change in any element
          produces about a unit change in the scaled value.

1. Does the final phrase 'produces about a unit change in the scaled
value' refer to the value of the objective function?  Substantively I
think it must, though grammatically it's less clear.

2. "Optimization is performed on 'par/parscale'" means
a) if par is 3 and parscale is 10 then the objective function will be
evaluated at .3.  This strikes me as the literal reading of what the
clause means; it also strikes me as extremely unlikely this is what
really happens.
or 
b) if par is 3 and  parscale is 10 then the objective function is
evaluated at 3.  The optimizer records this as if par were 30, and
subsequently, e.g. when computing deltas or making steps, does so in
this space.  So a step of d becomes a step of d/parscale for the real
objective function.
c) About the same as b, only steps of d become d*parscale.

3. Does scaling affect any of the final results (including
log-likelihood, std errors, ...), assuming the scaled and unscaled
methods find the same untransformed point?

I assume that scaling is transparent in the sense of 3, i.e. does not
affect any of the reported results (unless it changes how well the
optimizer works or fnscale converts minimizing to maximizing).  Even
given that, suppose I think that
f(x)-f(x1) approx equals f(x)-f(x2) where
x1[1] = x[1] + 10 and 
x2[2] = x[2] + 1, and x, x1, and x2 are otherwise equal.
Does this mean I should have parscale = c(10, 1) or parscale= (1/10, 1)?

Since I'm not sure about parscale, I'm really not sure about
     'ndeps' A vector of step sizes for the finite-difference
          approximation to the gradient, on 'par/parscale' scale.
          Defaults to '1e-3'.
So, if I don't do any other rescaling, I might say
ndeps=c(1e-2, 1e3)
in the previous example (response to x[1] is 10 times flatter than to
x[2]).

I guess that if I do have parscale set, I leave the default ndeps (1e-3
for both) and get the same effect.  Right?