[R] Categorical variable in a custom nonlin function with gnm
Brett.Murphy at csiro.au
Brett.Murphy at csiro.au
Thu Apr 30 07:29:06 CEST 2009
Hi all
I want to construct a generalised nonlinear model (binomial family) using gnm, of the form:
Response = a + b variable1 + c variable2 + d variable3 - d b variable4 - d c variable5,
with the parameters b, c, and d appearing more than once. Hence, I think I need to use a custom nonlin function with gnm.
One of my predictor variables is categorical, so I have created a dummy variable for each factor level (i.e. zeros and ones), and incorporated each of these variables into the model. I have previously done something similar with nonlinear least squares models (nls) and it worked very well, however gnm doesn't seem to like these dummy variables, and doesn't estimate the parameters.
It works if I put the dummy variables into a linear model using gnm, like so:
(where LithologyB is a series of ones and zeros, representing the occurrence of Lithology B, etc.)
> summary(gnm(response~LithologyB+LithologyC+LithologyD+LithologyE+LithologyF, family=binomial))
Linear predictor - using glm.fit
Call:
gnm(formula = response ~ LithologyB + LithologyC + LithologyD + LithologyE + LithologyF, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-17.676 -8.433 -1.181 7.825 23.415
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.026747 0.005900 -174.03 <2e-16 ***
LithologyB -0.715508 0.020844 -34.33 <2e-16 ***
LithologyC 0.272983 0.008588 31.79 <2e-16 ***
LithologyD -1.082133 0.020952 -51.65 <2e-16 ***
LithologyE 1.699259 0.007617 223.10 <2e-16 ***
LithologyF 1.505159 0.009932 151.54 <2e-16 ***
This gives the same result as putting in the original categorical Lithology variable. But something goes wrong when I create the same model using a custom nonlin function:
> gnm_function<- function(B1, B2, B3, B4, B5){
+ list(
+ predictors = list(B1=1, B2=1, B3=1, B4=1, B5=1),
+ variables = list(substitute(LithologyB), substitute(LithologyC), substitute(LithologyD), substitute(LithologyE), substitute(LithologyF)),
+ term = function(predLabels, varLabels) {
+ paste(
+ predLabels[1], "*", varLabels[1], "+", predLabels[2], "*", varLabels[2], "+", predLabels[3], "*", varLabels[3], "+", predLabels[4], "*", varLabels[4], "+", predLabels[5], "*", varLabels[5]
+ )})}
> class(gnm_function) <- "nonlin"
> summary(gnm(response~gnm_function(B1, B2, B3, B4, B5), family=binomial))
Initialising
Running main iterations
Done
Call:
gnm(formula = response ~ gnm_function(B1, B2, B3, B4, B5), family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-17.676 -9.787 -1.347 8.547 17.946
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.722412 0.003469 -208.2 <2e-16 ***
B1 NA NA NA NA
B2 NA NA NA NA
B3 NA NA NA NA
B4 1.394925 0.005936 235.0 <2e-16 ***
B5 NA NA NA NA
Can anyone suggest what is going wrong and a possible way around the problem?
Thanks
Brett Murphy
ARC Postdoctoral Research Fellow
School of Plant Science
University of Tasmania
c/o CSIRO Tropical Ecosystems Research Centre
PMB 44, Winnellie, NT 0822
Ph: (08) 8944 8447
Fax: (08) 8944 8444
More information about the R-help
mailing list