[R] Problem storing lm() model in a list

William Dunlap wdunlap at tibco.com
Sun Dec 5 20:35:53 CET 2010


Your problem arises because predict() is using
the value d=10 whenever it evaluates poly(x,d,raw=TRUE).
The details are that this is the value that d in your
function polyModelSelection had when the function finished.
That value is stored in the environment
   environment(poly.fit$models[[i]]$terms)
for i in 1:10 (which is the same environment for all
10 entries).  It contains the following objects
  > objects(envir=environment(poly.fit$models[[1]]$terms))
   [1] "add.dim"   "bestD"     "bestError" "d"        
   [5] "dim.mult"  "formula"   "lm.mod"    "lm.models"
   [9] "loss"      "maxd"      "x"         "y"    
and predict will look there for objects named in
the formula but not found in its newdata= argument.
Use get("d", envir=...) to see that d is 10 in all
of them.

I don't know the best way to avoid the problem.
One way is to put a literal value for degree into your
call to poly(), which also has the advantage that your
printouts show what it is.  In the following I used
substitute() to put the explicit value of degree into
your formula and used do.call when calling lm so the
formula itself (not the name "formula") is put into
the call component of the lm object.  This seems kludgy
to me, but I think it gets the job done.

polyModelSelection <- function (x, y, maxd, add.dim = F) 
{
    dim.mult <- 0
    if (add.dim) {
        dim.mult = 1
    }
    bestD <- 1
    bestError <- 1
    loss <- 1
    lm.models <- vector("list", maxd)
    for (d in 1:maxd) {
        # next assignment added
        formula <- substitute(y ~ 0 + poly(x, degree = d, raw = TRUE), 
            list(d = d))
        # change lm(...) to do.call("lm", list(...))
        lm.mod <- do.call("lm", list(formula))
        lm.models[[d]] <- lm.mod
        loss[d] <- modelError(lm.mod, data.frame(x), y) + d * 
            dim.mult
        if (d == 1) {
            bestError <- loss[d]
        }
        if (loss[d] < bestError) {
            bestError <- loss[d]
            bestD <- d
        }
    }
    list(loss = loss, models = lm.models, best = c(bestD))
}

Another way around the problem might be to replace
the environment stored in those terms compononents
with emptyenv().  Then none of the variables in the
function polyModelSelection could be used.  You could
do that with
  for(d in seq_along(poly.fit$models)) {
    environment(poly.fit$models[[1]]$terms) <- emptyenv()
  }
I like the first was better because I can easily see what
d is.
 
This sort of issue arises whenever we use functions
that use nonstandard evaluation rules, like the modelling
functions, subset(), library(), and help().  They are
easy to call from the top level command line but harder
to deal with when called with variables for arguments.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Harold Pimentel
> Sent: Friday, December 03, 2010 11:31 PM
> To: r-help at r-project.org
> Subject: [R] Problem storing lm() model in a list
> 
> Hi all,
> 
> I recently wrote some code to store a number of polynomial 
> models in a list and return this list. The model is returned 
> fine, but when I make a subsequent call to predict(), I have 
> an error. The code for polyModel selection is listed at the 
> end of the email. Here is an example of the error:
> 
> > poly.fit <- polyModelSelection(x,y,10,F)
> > for (d in 1:4) {
> +     lm.model <- poly.fit$models[[d]]
> +     curve(predict(lm.model,data.frame(x)),add=T,lty=d+1)
> + }
> Error: variable 'poly(x, d, raw = T)' was fitted with type 
> "nmatrix.1" but type "nmatrix.10" was supplied
> 
> Does anyone have any ideas? 
> 
> 
> 
> Thanks,
> 
> Harold
> 
> 
> ##############################################################
> ##############
> 
> polyModelSelection <- function(x,y,maxd,add.dim=F) {
>     dim.mult <- 0
>     if (add.dim) {
>         dim.mult = 1
>     }
>     bestD <- 1
>     bestError <- 1
>     loss <- 1
>     lm.models <- vector("list",maxd)
>     for (d in 1:maxd) {
>         lm.mod <- lm(y ~ 0 + poly(x,d,raw=T))
>         lm.models[[d]] <- lm.mod
>         loss[d] <- modelError(lm.mod,data.frame(x),y)+d*dim.mult
>         if (d == 1){
>             bestError <- loss[d]
>         }
>         if (loss[d] < bestError) {
>             bestError <- loss[d]
>             bestD <- d
>         }
>         # print(loss[d])
>     }
>     list(loss=loss,models=lm.models,best=c(bestD))
> }
> 
> modelError <- function(model,x,y) {
>     sum((y-predict(model,x))^2)
> }
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list