[R]: GLIM PROBLEMS
Prof Brian D Ripley
ripley at stats.ox.ac.uk
Tue Dec 2 12:16:06 CET 2003
In R you use cbind(successes, failures), not cbind(successes, total) as you
appear to have done.
And GLIM is a program (with I for interactive): these are GLMs, not GLIMs
in th ecommonly accepted terminology (but not that of SAS).
On Tue, 2 Dec 2003, allan clark wrote:
>
> Hi all
>
> I have another GLIM question.
>
> I have been using R as well as Genstat (version 6) in order to fit
> GLIM models to the data (displayed below).
>
> The same models are fitted but the answers supplied by the two
> packages are not the same.
>
> Why? Can anyone help?
>
> A discription of the data and the type of model/s fitted can be found
> below.
>
> Regards
> Allan
>
>
> The problem is taken from Bennet (1978) (I dont have any more of the
> reference.)
>
> In this example we wish to model the probability of a car insurance
> policyholder claiming insurance on his/her car given that we know
> certain information about him/her. The explanatory variables used in
> this analysis are: the age of the policyholder (age), the year of
> registration (reg) and a measure of the policyholders claim history
> called the no claim discount (ncd).
>
> Define p(i,j,k) as the probability of a policyholder in level i of
> age, level j of reg and level k of ncd making a claim (for i= 1 (age=
> 17-22), 2 (23-26), 3 (27-65), 4 (66-80) ; j=1 (registration after
> 1964), 2 (63-64), 3 (60-62), 4 (earlier than 1960) and ncd= 1 (1-1
> claims), 2 (2-3), 3 (more than three).).
>
> Similarly define r(i,j,k) as the proportion of policyholders in level
> i of age, level j of reg and level k of ncd making a claim.
>
> We can thus model r(i,j,k) by means of a binomial distribution with
> parameters p(i,j,k) and N(i,j,k) where N(i,j,k) is the total number
> of policyholders that falls into group i, j, k such that
> log(p(i,j,k)/(1-p(i,j,k))) = some function of age, reg and ncd .
>
> The baseline chosen is age (17-22), reg (65) and ncd (0-1).
>
>
> age reg ncd claims.r exp.n
> 1 (17-22) (65-) (0-1) 475 1800
> 2 (17-22) (65-) (2-3) 150 700
> 3 (17-22) (65-) (4+) 35 200
> 4 (17-22) (63-64) (0-1) 680 2650
> 5 (17-22) (63-64) (2-3) 215 1000
> 6 (17-22) (63-64) (4+) 55 250
> 7 (17-22) (60-62) (0-1) 710 2950
> 8 (17-22) (60-62) (2-3) 220 1100
> 9 (17-22) (60-62) (4+) 60 250
> 10 (17-22) (-59) (0-1) 230 1050
> 11 (17-22) (-59) (2-3) 75 450
> 12 (17-22) (-59) (4+) 25 250
> 13 (23-26) (65-) (0-1) 240 1300
> 14 (23-26) (65-) (2-3) 140 900
> 15 (23-26) (65-) (4+) 150 900
> 16 (23-26) (63-64) (0-1) 310 1500
> 17 (23-26) (63-64) (2-3) 185 1050
> 18 (23-26) (63-64) (4+) 170 1050
> 19 (23-26) (60-62) (0-1) 240 1405
> 20 (23-26) (60-62) (2-3) 160 1000
> 21 (23-26) (60-62) (4+) 130 1050
> 22 (23-26) (-59) (0-1) 80 600
> 23 (23-26) (-59) (2-3) 60 500
> 24 (23-26) (-59) (4+) 70 550
> 25 (27-65) (65-) (0-1) 1650 10300
> 26 (27-65) (65-) (2-3) 1450 9900
> 27 (27-65) (65-) (4+) 3400 28900
> 28 (27-65) (63-64) (0-1) 1550 9900
> 29 (27-65) (63-64) (2-3) 1450 10000
> 30 (27-65) (63-64) (4+) 3200 27700
> 31 (27-65) (60-62) (0-1) 1250 9300
> 32 (27-65) (60-62) (2-3) 1250 9200
> 33 (27-65) (60-62) (4+) 2500 25600
> 34 (27-65) (-59) (0-1) 500 4700
> 35 (27-65) (-59) (2-3) 550 5300
> 36 (27-65) (-59) (4+) 1400 18100
> 37 (66-80) (65-) (0-1) 55 275
> 38 (66-80) (65-) (2-3) 40 250
> 39 (66-80) (65-) (4+) 180 1400
> 40 (66-80) (63-64) (0-1) 35 225
> 41 (66-80) (63-64) (2-3) 30 225
> 42 (66-80) (63-64) (4+) 155 1450
> 43 (66-80) (60-62) (0-1) 25 200
> 44 (66-80) (60-62) (2-3) 40 300
> 45 (66-80) (60-62) (4+) 130 1500
> 46 (66-80) (-59) (0-1) 25 175
> 47 (66-80) (-59) (2-3) 30 300
> 48 (66-80) (-59) (4+) 180 2400
>
> claims.r =the number of claims made in a particular group.
> exp.n=the total number of policyholders in a particular group.
>
> EXAMPLE
>
> As an example if we use age as an explanatory variable and fit the glm
> model we get the following results:
>
> cars<-read.table("c:/a.dat",header=T)
> attach(cars)
> y<-cbind(claims.r,exp.n)
> cars.age<-glm(y~age,family=binomial)
> summary(cars.age)
>
> Call:
> glm(formula = y ~ age, family = binomial)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -16.62952 -1.84073 -0.04282 2.07028 10.72502
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -1.46265 0.02050 -71.34 <2e-16 ***
> age(23-26) -0.34576 0.03197 -10.82 <2e-16 ***
> age(27-65) -0.66345 0.02182 -30.41 <2e-16 ***
> age(66-80) -0.77863 0.04020 -19.37 <2e-16 ***
> ---
> Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 1837.4 on 47 degrees of freedom
> Residual deviance: 883.8 on 44 degrees of freedom
> AIC: 1228.4
>
> Number of Fisher Scoring iterations: 4
>
> The Genstat output is displayed below. (well only some of it!!!)
>
>
> ***** Regression Analysis *****
>
> Response variate: claims_r
> Binomial totals: exp_n
> Distribution: Binomial
> Link function: Logit
> Fitted terms: Constant, age
>
>
> *** Summary of analysis ***
>
> mean deviance approx
> d.f. deviance deviance ratio chi pr
> Regression 3 1298. 432.70 432.70 <.001
> Residual 44 1136. 25.83
> Total 47 2434. 51.80
> * MESSAGE: ratios are based on dispersion parameter with value 1
>
> Dispersion parameter is fixed at 1.00
>
>
> *** Estimates of parameters ***
> antilog of
> estimate s.e. t(*) t pr. estimate
> Constant -1.1992 0.0211 -56.90 <.001 0.3014
> age (23-26) -0.4302 0.0326 -13.20 <.001 0.6504
> age (27-65) -0.7999 0.0224 -35.75 <.001 0.4494
> age (66-80) -0.9297 0.0407 -22.86 <.001 0.3947
> * MESSAGE: s.e.s are based on dispersion parameter with value 1
>
> Parameters for factors are differences compared with the reference
> level:
> Factor Reference level
> age (17-22)
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list