[R]: GLIM PROBLEMS
allan clark
allan at stats.uct.ac.za
Tue Dec 2 09:56:54 CET 2003
Hi all
I have another GLIM question.
I have been using R as well as Genstat (version 6) in order to fit
GLIM models to the data (displayed below).
The same models are fitted but the answers supplied by the two
packages are not the same.
Why? Can anyone help?
A discription of the data and the type of model/s fitted can be found
below.
Regards
Allan
The problem is taken from Bennet (1978) (I dont have any more of the
reference.)
In this example we wish to model the probability of a car insurance
policyholder claiming insurance on his/her car given that we know
certain information about him/her. The explanatory variables used in
this analysis are: the age of the policyholder (age), the year of
registration (reg) and a measure of the policyholders claim history
called the no claim discount (ncd).
Define p(i,j,k) as the probability of a policyholder in level i of
age, level j of reg and level k of ncd making a claim (for i= 1 (age=
17-22), 2 (23-26), 3 (27-65), 4 (66-80) ; j=1 (registration after
1964), 2 (63-64), 3 (60-62), 4 (earlier than 1960) and ncd= 1 (1-1
claims), 2 (2-3), 3 (more than three).).
Similarly define r(i,j,k) as the proportion of policyholders in level
i of age, level j of reg and level k of ncd making a claim.
We can thus model r(i,j,k) by means of a binomial distribution with
parameters p(i,j,k) and N(i,j,k) where N(i,j,k) is the total number
of policyholders that falls into group i, j, k such that
log(p(i,j,k)/(1-p(i,j,k))) = some function of age, reg and ncd .
The baseline chosen is age (17-22), reg (65) and ncd (0-1).
age reg ncd claims.r exp.n
1 (17-22) (65-) (0-1) 475 1800
2 (17-22) (65-) (2-3) 150 700
3 (17-22) (65-) (4+) 35 200
4 (17-22) (63-64) (0-1) 680 2650
5 (17-22) (63-64) (2-3) 215 1000
6 (17-22) (63-64) (4+) 55 250
7 (17-22) (60-62) (0-1) 710 2950
8 (17-22) (60-62) (2-3) 220 1100
9 (17-22) (60-62) (4+) 60 250
10 (17-22) (-59) (0-1) 230 1050
11 (17-22) (-59) (2-3) 75 450
12 (17-22) (-59) (4+) 25 250
13 (23-26) (65-) (0-1) 240 1300
14 (23-26) (65-) (2-3) 140 900
15 (23-26) (65-) (4+) 150 900
16 (23-26) (63-64) (0-1) 310 1500
17 (23-26) (63-64) (2-3) 185 1050
18 (23-26) (63-64) (4+) 170 1050
19 (23-26) (60-62) (0-1) 240 1405
20 (23-26) (60-62) (2-3) 160 1000
21 (23-26) (60-62) (4+) 130 1050
22 (23-26) (-59) (0-1) 80 600
23 (23-26) (-59) (2-3) 60 500
24 (23-26) (-59) (4+) 70 550
25 (27-65) (65-) (0-1) 1650 10300
26 (27-65) (65-) (2-3) 1450 9900
27 (27-65) (65-) (4+) 3400 28900
28 (27-65) (63-64) (0-1) 1550 9900
29 (27-65) (63-64) (2-3) 1450 10000
30 (27-65) (63-64) (4+) 3200 27700
31 (27-65) (60-62) (0-1) 1250 9300
32 (27-65) (60-62) (2-3) 1250 9200
33 (27-65) (60-62) (4+) 2500 25600
34 (27-65) (-59) (0-1) 500 4700
35 (27-65) (-59) (2-3) 550 5300
36 (27-65) (-59) (4+) 1400 18100
37 (66-80) (65-) (0-1) 55 275
38 (66-80) (65-) (2-3) 40 250
39 (66-80) (65-) (4+) 180 1400
40 (66-80) (63-64) (0-1) 35 225
41 (66-80) (63-64) (2-3) 30 225
42 (66-80) (63-64) (4+) 155 1450
43 (66-80) (60-62) (0-1) 25 200
44 (66-80) (60-62) (2-3) 40 300
45 (66-80) (60-62) (4+) 130 1500
46 (66-80) (-59) (0-1) 25 175
47 (66-80) (-59) (2-3) 30 300
48 (66-80) (-59) (4+) 180 2400
claims.r =the number of claims made in a particular group.
exp.n=the total number of policyholders in a particular group.
EXAMPLE
As an example if we use age as an explanatory variable and fit the glm
model we get the following results:
cars<-read.table("c:/a.dat",header=T)
attach(cars)
y<-cbind(claims.r,exp.n)
cars.age<-glm(y~age,family=binomial)
summary(cars.age)
Call:
glm(formula = y ~ age, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-16.62952 -1.84073 -0.04282 2.07028 10.72502
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.46265 0.02050 -71.34 <2e-16 ***
age(23-26) -0.34576 0.03197 -10.82 <2e-16 ***
age(27-65) -0.66345 0.02182 -30.41 <2e-16 ***
age(66-80) -0.77863 0.04020 -19.37 <2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1837.4 on 47 degrees of freedom
Residual deviance: 883.8 on 44 degrees of freedom
AIC: 1228.4
Number of Fisher Scoring iterations: 4
The Genstat output is displayed below. (well only some of it!!!)
***** Regression Analysis *****
Response variate: claims_r
Binomial totals: exp_n
Distribution: Binomial
Link function: Logit
Fitted terms: Constant, age
*** Summary of analysis ***
mean deviance approx
d.f. deviance deviance ratio chi pr
Regression 3 1298. 432.70 432.70 <.001
Residual 44 1136. 25.83
Total 47 2434. 51.80
* MESSAGE: ratios are based on dispersion parameter with value 1
Dispersion parameter is fixed at 1.00
*** Estimates of parameters ***
antilog of
estimate s.e. t(*) t pr. estimate
Constant -1.1992 0.0211 -56.90 <.001 0.3014
age (23-26) -0.4302 0.0326 -13.20 <.001 0.6504
age (27-65) -0.7999 0.0224 -35.75 <.001 0.4494
age (66-80) -0.9297 0.0407 -22.86 <.001 0.3947
* MESSAGE: s.e.s are based on dispersion parameter with value 1
Parameters for factors are differences compared with the reference
level:
Factor Reference level
age (17-22)
More information about the R-help
mailing list