[R] nontabular logistic regression
Marc Schwartz
MSchwartz at mn.rr.com
Fri Oct 13 17:42:45 CEST 2006
On Fri, 2006-10-13 at 09:28 -0500, Jeffrey Stratford wrote:
> Hi. I'm attempting to fit a logistic/binomial model so I can determine
> the influence of landscape on the probability that a box gets used by a
> bird. I've looked at a few sources (MASS text, Dalgaard, Fox and
> google) and the examples are almost always based on tabular predictor
> variables. My data, however are not. I'm not sure if that is the
> source of the problems or not because the one example that includes a
> continuous predictor looks to be coded exactly the same way. Looking at
> the output, I get estimates for each case when I should get a single
> estimate for purbank. Any suggestions?
>
> Many thanks,
>
> Jeff
>
>
> THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the rest
> are landscape variables).
>
> box use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist grasspatchk
> 1 1 0.003813435 0.02684564 0.06896552 0.3282487 0.6845638 0.7586207 0 3.73
> 2 1 0.04429451 0.1610738 0.1724138 0.1534174 0.3825503 0.6551724 0 1.023261
> 3 1 0.04458785 0.06040268 0 0.1628043 0.557047 0.7586207 0 0.9605769
> 4 1 0.06072162 0.2080537 0.06896552 0.01936052 0 0 323.1099 0.2284615
> 5 0 0.6080962 0.6979866 0.6896552 0.03168084 0.1275168 0.2413793 30 0.2627027
> 6 1 0.6060428 0.6107383 0.3448276 0.04077442 0.2885906 0.4482759 30 0.2978571
> 7 1 0.3807568 0.4362416 0.6896552 0.06864183 0.03355705 0 94.86833 0.468
> 8 0 0.3649164 0.3154362 0.4137931 0.06277501 0.1275168 0 120 0.4585714
>
> THE CODE:
>
> box.use<- read.csv("c:\\eabl\\2004\\use_logistic2.csv", header=TRUE)
> attach(box.use)
> box.use <- na.omit(box.use)
> use <- factor(use, levels=0:1)
> levels(use) <- c("unused", "used")
> glm1 <- glm(use ~ purbank, binomial)
>
> THE OUTPUT:
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -4.544e-16 1.414e+00 -3.21e-16 1.000
> purbank0 2.157e+01 2.923e+04 0.001 0.999
> purbank0.001173365 2.157e+01 2.067e+04 0.001 0.999
> purbank0.001466706 2.157e+01 2.923e+04 0.001 0.999
> purbank0.001760047 6.429e-16 2.000e+00 3.21e-16 1.000
> purbank0.002346729 2.157e+01 2.923e+04 0.001 0.999
> purbank0.003813435 2.157e+01 2.923e+04 0.001 0.999
> purbank0.004106776 2.157e+01 2.067e+04 0.001 0.999
> purbank0.004693458 2.157e+01 2.067e+04 0.001 0.999
It appears that the 'purbank' variable is being imported as a factor,
hence the multiple levels indicated in the left hand column.
Check:
str(box.use)
right after the read.csv() step and see what it shows.
>From the sample data above, it _should_ be along the lines of:
> str(box.use)
'data.frame': 8 obs. of 10 variables:
$ box : int 1 2 3 4 5 6 7 8
$ use : int 1 1 1 1 0 1 1 0
$ purbank : num 0.00381 0.04429 0.04459 0.06072 0.60810 ...
$ purban2 : num 0.0268 0.1611 0.0604 0.2081 0.6980 ...
$ purban1 : num 0.069 0.172 0.000 0.069 0.690 ...
$ pgrassk : num 0.3282 0.1534 0.1628 0.0194 0.0317 ...
$ pgrass2 : num 0.685 0.383 0.557 0.000 0.128 ...
$ pgrass1 : num 0.759 0.655 0.759 0.000 0.241 ...
$ grassdist : num 0 0 0 323 30 ...
$ grasspatchk: num 3.730 1.023 0.961 0.228 0.263 ...
Hence, you should be able to use:
> model <- glm(use ~ purbank, data = box.use, family = binomial)
> summary(model)
Call:
glm(formula = use ~ purbank, family = binomial, data = box.use)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.61450 -0.03098 0.31935 0.45888 1.39194
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.223 2.225 1.448 0.147
purbank -6.129 4.773 -1.284 0.199
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8.9974 on 7 degrees of freedom
Residual deviance: 6.5741 on 6 degrees of freedom
AIC: 10.574
Number of Fisher Scoring iterations: 5
Note that na.omit() is the default operation for most R models, so is
redundant. Also, I would not attach the data frame, as you can use the
'data' argument in model related functions. This avoids the confusion of
having multiple copies of the source data set and wondering why changes
made can become confusing and problematic.
HTH,
Marc Schwartz
More information about the R-help
mailing list