[R] Categorical variable in a custom nonlin function with gnm

Brett.Murphy at csiro.au Brett.Murphy at csiro.au
Thu Apr 30 07:29:06 CEST 2009


Hi all

I want to construct a generalised nonlinear model (binomial family) using gnm, of the form:

Response = a + b variable1 + c variable2 + d variable3 - d b variable4 - d c variable5,
with the parameters b, c, and d appearing more than once.  Hence, I think I need to use a custom nonlin function with gnm.

One of my predictor variables is categorical, so I have created a dummy variable for each factor level (i.e. zeros and ones), and incorporated each of these variables into the model.  I have previously done something similar with nonlinear least squares models (nls) and it worked very well, however gnm doesn't seem to like these dummy variables, and doesn't estimate the parameters.

It works if I put the dummy variables into a linear model using gnm, like so:
(where LithologyB is a series of ones and zeros, representing the occurrence of Lithology B, etc.)

> summary(gnm(response~LithologyB+LithologyC+LithologyD+LithologyE+LithologyF, family=binomial))
Linear predictor - using glm.fit

Call:
gnm(formula = response ~ LithologyB + LithologyC + LithologyD +     LithologyE + LithologyF, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-17.676   -8.433   -1.181    7.825   23.415  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.026747   0.005900 -174.03   <2e-16 ***
LithologyB  -0.715508   0.020844  -34.33   <2e-16 ***
LithologyC   0.272983   0.008588   31.79   <2e-16 ***
LithologyD  -1.082133   0.020952  -51.65   <2e-16 ***
LithologyE   1.699259   0.007617  223.10   <2e-16 ***
LithologyF   1.505159   0.009932  151.54   <2e-16 ***



This gives the same result as putting in the original categorical Lithology variable.  But something goes wrong when I create the same model using a custom nonlin function:
> gnm_function<- function(B1, B2, B3, B4, B5){
+ list(
+ predictors = list(B1=1, B2=1, B3=1, B4=1, B5=1),
+ variables = list(substitute(LithologyB), substitute(LithologyC), substitute(LithologyD), substitute(LithologyE), substitute(LithologyF)),
+ term = function(predLabels, varLabels) {
+ paste(
+ predLabels[1], "*", varLabels[1], "+", predLabels[2], "*", varLabels[2], "+", predLabels[3], "*", varLabels[3], "+", predLabels[4], "*", varLabels[4], "+", predLabels[5], "*", varLabels[5]
+ )})}
> class(gnm_function) <- "nonlin"

> summary(gnm(response~gnm_function(B1, B2, B3, B4, B5), family=binomial))
Initialising
Running main iterations
Done

Call:
gnm(formula = response ~ gnm_function(B1, B2, B3, B4, B5),     family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-17.676   -9.787   -1.347    8.547   17.946  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.722412   0.003469  -208.2   <2e-16 ***
B1                 NA         NA      NA       NA    
B2                 NA         NA      NA       NA    
B3                 NA         NA      NA       NA    
B4           1.394925   0.005936   235.0   <2e-16 ***
B5                 NA         NA      NA       NA 



Can anyone suggest what is going wrong and a possible way around the problem?

Thanks


Brett Murphy
ARC Postdoctoral Research Fellow
School of Plant Science
University of Tasmania
 
c/o CSIRO Tropical Ecosystems Research Centre
PMB 44, Winnellie, NT 0822
 
Ph: (08) 8944 8447
Fax: (08) 8944 8444




More information about the R-help mailing list