[R] Call to glm inside a function

Thomas Lumley tlumley at u.washington.edu
Mon Feb 17 20:18:03 CET 2003


On Mon, 17 Feb 2003, Spencer Graves wrote:

>
> 	  Others know this subject much better than I do, and you should ignore
> these remarks if you get a more authoritative reply sooner, but in the
> interest of getting you an answer now, I'll expose my ignorance on this
> subject:  First, if you have access to a copy of Venables and Ripley, S
> Programming, I would read about "assign" there (and all other
> documentation you can find).

I would actually say that reading about 'assign' is the last resort --
most of the time you would be happier if you didn't know it existed.

> 	  I'm guessing that "gml" probably uses "get" to find the weights and
> can't find them because it does not look in the frame of the calling
> function.

Partially.  What is actually happening is complicated and since we weren't
given an example it's hard to be sure, but here is an attempt at
diagnosis.

Consider the following two functions
> test1
function(df){
ff<-y~x
glm(ff,data=df,weights=w)
}
> test2
function(df,ff){
glm(ff,data=df,weights=eval(df$w))
}

The first one `works', the second doesn't.  Now, the data and weights are
the same in both cases. What has changed is *where the formula was
defined*.  Formulas in R carry around a environment (usually where they
were created).

Now, glm() has to go through some complicated contortions in looking for
variables.  These are necessary to pick up variables that *aren't* in the
data= argument.  If everything is in the data argument then glm() has no
problems.  However, eval(df$w) isn't a reference to the data argument, so
it gets evaluated in the environment of the formula, where it doesn't
exist.  This would normally give an error, but if the data frame is called
`df' or `data' or `install.packages' or some other name that does exist in
the global environment you will get the $w component of that object,
almost certainly NULL, and end up not using any weights.

This suggests a solution to the problem:

> test3
function(df,ff){
glm(ff,data=df,weights=w)
}

which works because `w' is now found in the data= argument.

Now, some general lessons can be drawn from this (which is why it's worth
such a long response)

 --  Life is a lot simpler if everything that can be in the data argument
     is there.
 --  It's a real pity that glm() didn't come with syntax to distinguish
explictly between references to data= and actual variables (eg weights=~w
vs weights=w). If you are writing functions of your own, please use some
such syntax.
 --  A useful aid to debugging is to change variable names: data$w or
     df$returns NULL, but myDataFrame$w is an error.


> 	  To get around this, try an "assign" something like the following
> before your call to "glm":
>
> 	  assign("data", data)
>
>    	  I'm not certain this will work, but I've had similar problems in
> S-Plus and worked around them using assign.

While the symptoms are similar the underlying problem is different in R.
This is like antibiotics for a cold: it might appear to work by accident,
but prolonged use will create highly resistant bugs.


> 	  I don't have time to work an example now,  but if this does not fix
> your problem, you might experiment with the "assign" arguments until you
> find something that works.  I don't like being too bold with "assign",
> because I don't understand what it does, and it could be just a little
> dangerous -- like overwriting a library or something.

The main danger is ending up with subtle bugs in your code.  It's not too
hard to understand what a particular call to assign() does, but it can be
very hard to work out what goes wrong when a use of assign() somewhere
else in your code changes a variable.

	-thomas




More information about the R-help mailing list