[R]: GLIM PROBLEMS

allan clark allan at stats.uct.ac.za
Tue Dec 2 09:56:54 CET 2003


   Hi all

   I have another GLIM question.

   I have been using R as well as Genstat (version 6) in order to fit
   GLIM models to the data (displayed below).

   The same models are fitted but the answers supplied by the two
   packages are not the same.

   Why? Can anyone help?

   A discription of the data and the type of model/s fitted can be found
   below.

   Regards
   Allan


   The problem is taken from Bennet (1978)  (I dont have any more of the
   reference.)

   In this example we wish to model the probability of a car insurance
   policyholder claiming insurance on his/her car given that we know
   certain information about him/her. The explanatory variables used in
   this analysis are: the age of the policyholder (age), the year of
   registration (reg) and a measure of the policyholders claim history
   called the no claim discount (ncd).

   Define p(i,j,k)  as the probability of a policyholder in level i of
   age, level j of reg and level k of ncd making a claim (for i= 1 (age=
   17-22), 2 (23-26), 3 (27-65), 4 (66-80) ; j=1 (registration after
   1964), 2 (63-64), 3 (60-62), 4 (earlier than 1960) and ncd= 1 (1-1
   claims), 2 (2-3), 3 (more than three).).

   Similarly define r(i,j,k)  as the proportion of policyholders in level
   i of age, level j of reg and level k of ncd making a claim.

   We can thus model r(i,j,k) by means of a binomial distribution with
   parameters p(i,j,k) and N(i,j,k)  where N(i,j,k)  is the total number
   of policyholders that falls into group i, j, k such that
   log(p(i,j,k)/(1-p(i,j,k))) = some function of age, reg and ncd .

   The baseline chosen is age (17-22), reg (65) and ncd (0-1).


         age     reg   ncd claims.r exp.n
   1  (17-22)   (65-) (0-1)      475  1800
   2  (17-22)   (65-) (2-3)      150   700
   3  (17-22)   (65-)  (4+)       35   200
   4  (17-22) (63-64) (0-1)      680  2650
   5  (17-22) (63-64) (2-3)      215  1000
   6  (17-22) (63-64)  (4+)       55   250
   7  (17-22) (60-62) (0-1)      710  2950
   8  (17-22) (60-62) (2-3)      220  1100
   9  (17-22) (60-62)  (4+)       60   250
   10 (17-22)   (-59) (0-1)      230  1050
   11 (17-22)   (-59) (2-3)       75   450
   12 (17-22)   (-59)  (4+)       25   250
   13 (23-26)   (65-) (0-1)      240  1300
   14 (23-26)   (65-) (2-3)      140   900
   15 (23-26)   (65-)  (4+)      150   900
   16 (23-26) (63-64) (0-1)      310  1500
   17 (23-26) (63-64) (2-3)      185  1050
   18 (23-26) (63-64)  (4+)      170  1050
   19 (23-26) (60-62) (0-1)      240  1405
   20 (23-26) (60-62) (2-3)      160  1000
   21 (23-26) (60-62)  (4+)      130  1050
   22 (23-26)   (-59) (0-1)       80   600
   23 (23-26)   (-59) (2-3)       60   500
   24 (23-26)   (-59)  (4+)       70   550
   25 (27-65)   (65-) (0-1)     1650 10300
   26 (27-65)   (65-) (2-3)     1450  9900
   27 (27-65)   (65-)  (4+)     3400 28900
   28 (27-65) (63-64) (0-1)     1550  9900
   29 (27-65) (63-64) (2-3)     1450 10000
   30 (27-65) (63-64)  (4+)     3200 27700
   31 (27-65) (60-62) (0-1)     1250  9300
   32 (27-65) (60-62) (2-3)     1250  9200
   33 (27-65) (60-62)  (4+)     2500 25600
   34 (27-65)   (-59) (0-1)      500  4700
   35 (27-65)   (-59) (2-3)      550  5300
   36 (27-65)   (-59)  (4+)     1400 18100
   37 (66-80)   (65-) (0-1)       55   275
   38 (66-80)   (65-) (2-3)       40   250
   39 (66-80)   (65-)  (4+)      180  1400
   40 (66-80) (63-64) (0-1)       35   225
   41 (66-80) (63-64) (2-3)       30   225
   42 (66-80) (63-64)  (4+)      155  1450
   43 (66-80) (60-62) (0-1)       25   200
   44 (66-80) (60-62) (2-3)       40   300
   45 (66-80) (60-62)  (4+)      130  1500
   46 (66-80)   (-59) (0-1)       25   175
   47 (66-80)   (-59) (2-3)       30   300
   48 (66-80)   (-59)  (4+)      180  2400

   claims.r =the number of claims made in a particular group.
   exp.n=the total number of policyholders in a particular group.

   EXAMPLE

   As an example if we use age as an explanatory variable and fit the glm
   model we get the following results:

   cars<-read.table("c:/a.dat",header=T)
   attach(cars)
   y<-cbind(claims.r,exp.n)
   cars.age<-glm(y~age,family=binomial)
   summary(cars.age)

   Call:
   glm(formula = y ~ age, family = binomial)

   Deviance Residuals:
         Min         1Q     Median         3Q        Max
   -16.62952   -1.84073   -0.04282    2.07028   10.72502

   Coefficients:
               Estimate Std. Error z value Pr(>|z|)
   (Intercept) -1.46265    0.02050  -71.34   <2e-16 ***
   age(23-26)  -0.34576    0.03197  -10.82   <2e-16 ***
   age(27-65)  -0.66345    0.02182  -30.41   <2e-16 ***
   age(66-80)  -0.77863    0.04020  -19.37   <2e-16 ***
   ---
   Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

   (Dispersion parameter for binomial family taken to be 1)

       Null deviance: 1837.4  on 47  degrees of freedom
   Residual deviance:  883.8  on 44  degrees of freedom
   AIC: 1228.4

   Number of Fisher Scoring iterations: 4

   The Genstat output is displayed below.  (well only some of it!!!)


   ***** Regression Analysis *****

   Response variate: claims_r
   Binomial totals: exp_n
   Distribution: Binomial
   Link function: Logit
   Fitted terms: Constant, age


   *** Summary of analysis ***

                                           mean  deviance approx
                 d.f.     deviance     deviance     ratio chi pr
   Regression       3        1298.       432.70    432.70  <.001
   Residual        44        1136.        25.83
   Total           47        2434.        51.80
   * MESSAGE: ratios are based on dispersion parameter with value 1

   Dispersion parameter is fixed at 1.00


   *** Estimates of parameters ***
                                                            antilog of
                     estimate         s.e.      t(*)  t pr.   estimate
   Constant           -1.1992       0.0211    -56.90  <.001     0.3014
   age (23-26)        -0.4302       0.0326    -13.20  <.001     0.6504
   age (27-65)        -0.7999       0.0224    -35.75  <.001     0.4494
   age (66-80)        -0.9297       0.0407    -22.86  <.001     0.3947
   * MESSAGE: s.e.s are based on dispersion parameter with value 1

   Parameters for factors are differences compared with the reference
   level:
                 Factor  Reference level
                    age  (17-22)



More information about the R-help mailing list