[R] Excel can do what R can't?????

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Aug 4 11:24:35 CEST 2003


On Sun, 3 Aug 2003, Fan wrote:

> I've found that the discussions are interesting, generally speaking,
> peoples seem equally confident on R's optim/nlm and Excel's solver.
> 
> The authors of the algorithm GRG2 (Generalized Reduced Gradient)
> are not cited in the documentation of optim(), so I'm wondering if
> the optimization algorithm implemented in Excel is "fondamentally"
> the same than that in R ?

I don't suppose Excel cites the method*s* used in optim() either,
but GRG2 is not in the index of my copies of any of the standard texts on 
numerical optimization.

> 
> Thanks in advance
> --
> Fan
> 
> Damon Wischik wrote:
> > Michael Rennie wrote:
> > 
> >>Last, it's not even that I'm getting error messages anymore- I just
> >>can't get the solution that I get from Excel.  If I try to let R find
> >>the solution, and give it starting values of c(1,2), it gives me an
> >>optimization solution, but an extremely poor one.  However, if I give it
> >>the answer I got from excel, it comes right back with the same answer
> >>and solutions I get from excel. 
> >>
> >>Using the 'trace' function, I can see that R gets stuck in a specific
> >>region of parameter space in looking for the optimization and just
> >>appears to give up.  Even when it re-set itself, it keeps going back to
> >>this region, and thus doesn't even try a full range of the parameter
> >>space I've defined before it stops and gives me the wrong answer. 
> > 
> > 
> > 1. Either your function or the Excel solver is wrong. I executed your
> > source code (which defines f), then ran it over a grid of points, and
> > plotted the answer, using this code:
> > 
> > xvals <- seq(.2,2,by=.2)
> > yvals <- seq(1,3,by=.2)
> > z <- matrix(NA,nrow=length(xvals),ncol=length(yvals))
> > for (i in 1:length(xvals)) for (j in 1:length(yvals)) {
> >   x <- xvals[i]
> >   y <- yvals[j]
> >   z[i,j] <- f(c(x,y))
> >   }
> > filled.contour(x=xvals,y=yvals,z=log(z))
> > 
> > Your "solution" from Excel evaluates to
> >   f(c(.558626306252032,1.66764519286918)) == 0.3866079
> > while I easily found a point which was much better,
> >   f(c(.4,1)) = 7.83029e-05
> > 
> > You should have tried executing your function over a grid of points, and
> > plotting the results in a contour plot, to see if optim was working
> > sensibly. You could do the same grid in Excel and R to verify that the
> > function you've defined does the same thing in each.
> > 
> > Since your optimization is only over a 2D parameter space, it is easy for
> > you to plot the results, to see at a glance what the optimum is, and to
> > work out what is going wrong.
> > 
> > 2. Your code executes very slowly because it is programmed inefficiently.
> > You need to iterate a function to get your final solution, but you don't
> > need to keep track of all the states you visit on the way. The way R
> > works, whenever you assign a value to a certain index in a vector, as in 
> >   A[i] <- 10,
> > the system actually copies the entire vector. So, in every iteration, you
> > are copying very many vectors, and this is needlessly slowing down the
> > program. Also, at the end of each iteration, you define
> >   bio <- cbind(W, C, ASMR, SMR, A, F, U, SDA, Gr, Ed, GHg, EGK, Hg)
> > which creates a matrix. But you only ever use this matrix right at the
> > end, and so there is no need to create this 365*14 matrix at every single
> > iteration.
> > 
> > It looks to me as if you took some Excel code and translated it directly
> > into R. This will not produce efficient R code. Your iterative loop would
> > be more naturally expressed in R as
> > 
> > f <- function(q) {
> >   p <- q[[1]]
> >   ACT <- q[[2]]
> >   # cat(paste("Trying p=",p," ACT=",ACT,"\n",sep=""))
> >   state <- c(W=Wo,Hg=Hgo)
> >   numdays <- length(temps)
> >   for (i in 1:numdays)
> >     state <- updateState(state,
> >                          jday=temps$jday[i],temp=temps$Temp[i],M=numdays,
> >                          p=p,ACT=ACT)
> >   Wtmod <- state[["W"]]
> >   Hgtmod <- state[["Hg"]]
> >   (Wt-Wtmod)^2/Wt + (Hgt-Hgtmod)^2/Hgt
> >   }
> > 
> > updateState <- function(state,jday,temp,M,p,ACT) {
> >   # Given W[i-1] and Hg[i-1], want to compute W[i] and Hg[i]
> >   W <- state[["W"]]
> >   Hg <- state[["Hg"]]
> >   # First compute certain parameters: Vc[i-1] ... Expegk[i-1]
> >   Vc <- (CTM-temp)/(CTM-CTO)
> >   Vr <- (RTM-temp)/(RTM-RTO)
> >   C <-      p * CA * W^CB * Vc^Xc * exp(Xc*(1-Vc)) * Pc
> >   ASMR <- ACT * RA * W^RB * Vr^Xa * exp(Xa*(1-Vr))
> >   ...
> >   # Now find W[i] and Hg[i]
> >   Wnew <- if (!(jday==121 && Mat==1)) W+Gr/Ef
> >           else                        W * (1-GSI*1.2)
> >   Hgnew <- a*Hgp*C*(1-Expegk)/(Pc*W*EGK) + Hg*Expegk
> >   c(W=Wnew,Hg=Hgnew)
> >   }
> > 
> > In this code, I do not attempt to keep the entire array in memory. All I
> > need to know at each iteration is the value of state=(W,Hg) at time i-1,
> > and from this I compute the new value at time i.
> > 
> > 3. You use some thoroughly weird code to read in a table. You should add a
> > row to the top of your table with variable names, then just use
> >   temps <- read.table("TEMP.DAT", header=TRUE)
> >   temps$Vc <- (CTM-temps$temp)/(CTM-CTO)
> > This would also avoid leaving global variables (like Day) hanging around
> > the place. Global variables cause confusion: see the next point.
> > 
> > 4. Here are some lines taken from your code.
> > 
> > p <- NULL
> > ACT <- NULL
> > 
> > #starting values for p, ACT
> > p <- 1
> > ACT <- 2
> > 
> > f <- function (q)
> >   {
> >   F[i]<- (FA*((comp[i,3])^FB)*(exp(FG*p))*C[i])
> >   # (and ACT is never referred to)
> >   }
> > 
> > Why did you define p<-NULL and ACT<-NULL at the top? Those definitions are
> > irrelevant, because they are overridden by p<-1 and ACT<-2.
> > 
> > In the body of your function f, in defining F[i], you refer to the
> > variable p. The only assignment to p is in the line p<-1. I strongly
> > suspect this is an error. Probably you want to refer to q[1]. The best way
> > to do this (as you can see from my code above) is to define p and ACT at
> > the beginning of f.
> > 
> > 5. Some minor comments on code. It's unwise to use T or F as variable
> > names in R, because of the potential for confusion with S-Plus, which uses
> > them for TRUE and False. Also, you don't need all those brackets: A*(B*C)
> > is the same as A*B*C, and ((A/B)/C) is more transparently written as
> > A/(B*C). Also, you should indent your code, since otherwise you'll just
> > confuse yourself and other people.
> > 
> > 6. I've written a version of the code which takes all these comments into
> > account. It doesn't agree with your Excel solution. You haven't given us
> > enough real data for me to work out if there's a bug in my code or if the
> > Excel solution is wrong. Once you have worked out a function f which you
> > know to be correct (checked by drawing a contour plot), if you have any
> > more problems, share it with us and we may be able to help. 
> > 
> > Damon Wischik.
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list