[R-sig-Geo] Problem with categorical variable coefficients and se in glm
WillM
annamac_80 at hotmail.com
Sun Nov 4 00:15:51 CET 2012
Hi all
I'm hoping that this is something that people deal with regularly and you
can help me out quickly even though it is a bit more of a stats question
than R.
I have a dataset where data$Resp is a count response variable (lots of 0s)
so I used a negative binomial glm with a categorical response variable. The
categories are the types of vegetation that I stratified my sampling by - so
they are not an arbitrary post hoc decision.
The UM category only has 0's for a response and produces a large coefficient
and large standard error (see output below). So I added a small number (1)
to one row of the UM category to explore what was happening and get a better
result. With a continuous response variable you can add a very small number
(say 0.001) so that it is still representative of 0, but with this count
data, 1 is the minimum.
I get a better estimate, but is there some better way of dealing with this
type of situation? I could possibly combine UM and UB categories, but I did
want to keep them separate.
Thanks alot :)
>data
Resp Cat
1 0 D
2 0 D
3 0 D
4 0 D
5 3 D
6 0 D
7 0 D
8 0 D
9 11 F
10 11 F
11 3 F
12 14 F
13 19 F
14 41 F
15 12 S
16 55 S
17 3 S
18 0 S
19 0 S
20 30 F
21 4 F
22 10 F
23 99 DS
24 3 DS
25 1 DS
26 7 DS
27 4 DS
28 0 DS
29 2 DS
30 1 DS
31 0 UB
32 0 UB
33 0 UB
34 0 UB
35 1 UB
36 0 UM
37 0 UM
38 0 UM
39 0 UM
40 0 UM
> mod.nb <- glm.nb(data$Resp ~ data$Cat, data=data)
> summary(mod.nb)
Call:
glm.nb(formula = data$Resp ~ data$Cat, data = data, init.theta =
0.5087557508,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.85799 -0.75714 -0.58082 -0.00009 1.95946
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.9808 0.7609 -1.289 0.197409
data$CatF 3.7464 0.8969 4.177 2.95e-05 ***
data$CatS 3.6199 0.9932 3.645 0.000268 ***
data$CatDS 3.6636 0.9128 4.013 5.99e-05 ***
data$CatUB -0.6286 1.4043 -0.448 0.654427
data$CatUM -18.3218 4215.7113 -0.004 0.996532
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.5088) family taken to be 1)
Null deviance: 77.163 on 39 degrees of freedom
Residual deviance: 33.700 on 34 degrees of freedom
AIC: 191.12
Number of Fisher Scoring iterations: 1
Theta: 0.509
Std. Err.: 0.152
2 x log-likelihood: -177.120
> data1<-data
> data1[40,1]<- 1 #add a small value to one of the UM categories
> mod.nb1 <- glm.nb(data1$Resp ~ data1$Cat, data=data1) #run model again
> summary(mod.nb1)
Call:
glm.nb(formula = data1$Resp ~ data1$Cat, data = data1, init.theta =
0.515774723,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8671 -0.7593 -0.5814 -0.2098 1.9726
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.9808 0.7587 -1.293 0.196112
data1$CatF 3.7464 0.8934 4.194 2.75e-05 ***
data1$CatS 3.6199 0.9888 3.661 0.000251 ***
data1$CatDS 3.6636 0.9092 4.030 5.59e-05 ***
data1$CatUB -0.6286 1.4012 -0.449 0.653712
data1$CatUM -0.6286 1.4012 -0.449 0.653712
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.5158) family taken to be 1)
Null deviance: 76.133 on 39 degrees of freedom
Residual deviance: 36.328 on 34 degrees of freedom
AIC: 196.69
Number of Fisher Scoring iterations: 1
Theta: 0.516
Std. Err.: 0.152
2 x log-likelihood: -182.686
Thanks!
--
View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Problem-with-categorical-variable-coefficients-and-se-in-glm-tp7581579.html
Sent from the R-sig-geo mailing list archive at Nabble.com.
More information about the R-sig-Geo
mailing list