[R]: GLIM PROBLEMS

Prof Brian D Ripley ripley at stats.ox.ac.uk
Tue Dec 2 12:16:06 CET 2003


In R you use cbind(successes, failures), not cbind(successes, total) as you
appear to have done.

And GLIM is a program (with I for interactive): these are GLMs, not GLIMs
in th ecommonly accepted terminology (but not that of SAS).

On Tue, 2 Dec 2003, allan clark wrote:

>
>    Hi all
>
>    I have another GLIM question.
>
>    I have been using R as well as Genstat (version 6) in order to fit
>    GLIM models to the data (displayed below).
>
>    The same models are fitted but the answers supplied by the two
>    packages are not the same.
>
>    Why? Can anyone help?
>
>    A discription of the data and the type of model/s fitted can be found
>    below.
>
>    Regards
>    Allan
>
>
>    The problem is taken from Bennet (1978)  (I dont have any more of the
>    reference.)
>
>    In this example we wish to model the probability of a car insurance
>    policyholder claiming insurance on his/her car given that we know
>    certain information about him/her. The explanatory variables used in
>    this analysis are: the age of the policyholder (age), the year of
>    registration (reg) and a measure of the policyholders claim history
>    called the no claim discount (ncd).
>
>    Define p(i,j,k)  as the probability of a policyholder in level i of
>    age, level j of reg and level k of ncd making a claim (for i= 1 (age=
>    17-22), 2 (23-26), 3 (27-65), 4 (66-80) ; j=1 (registration after
>    1964), 2 (63-64), 3 (60-62), 4 (earlier than 1960) and ncd= 1 (1-1
>    claims), 2 (2-3), 3 (more than three).).
>
>    Similarly define r(i,j,k)  as the proportion of policyholders in level
>    i of age, level j of reg and level k of ncd making a claim.
>
>    We can thus model r(i,j,k) by means of a binomial distribution with
>    parameters p(i,j,k) and N(i,j,k)  where N(i,j,k)  is the total number
>    of policyholders that falls into group i, j, k such that
>    log(p(i,j,k)/(1-p(i,j,k))) = some function of age, reg and ncd .
>
>    The baseline chosen is age (17-22), reg (65) and ncd (0-1).
>
>
>          age     reg   ncd claims.r exp.n
>    1  (17-22)   (65-) (0-1)      475  1800
>    2  (17-22)   (65-) (2-3)      150   700
>    3  (17-22)   (65-)  (4+)       35   200
>    4  (17-22) (63-64) (0-1)      680  2650
>    5  (17-22) (63-64) (2-3)      215  1000
>    6  (17-22) (63-64)  (4+)       55   250
>    7  (17-22) (60-62) (0-1)      710  2950
>    8  (17-22) (60-62) (2-3)      220  1100
>    9  (17-22) (60-62)  (4+)       60   250
>    10 (17-22)   (-59) (0-1)      230  1050
>    11 (17-22)   (-59) (2-3)       75   450
>    12 (17-22)   (-59)  (4+)       25   250
>    13 (23-26)   (65-) (0-1)      240  1300
>    14 (23-26)   (65-) (2-3)      140   900
>    15 (23-26)   (65-)  (4+)      150   900
>    16 (23-26) (63-64) (0-1)      310  1500
>    17 (23-26) (63-64) (2-3)      185  1050
>    18 (23-26) (63-64)  (4+)      170  1050
>    19 (23-26) (60-62) (0-1)      240  1405
>    20 (23-26) (60-62) (2-3)      160  1000
>    21 (23-26) (60-62)  (4+)      130  1050
>    22 (23-26)   (-59) (0-1)       80   600
>    23 (23-26)   (-59) (2-3)       60   500
>    24 (23-26)   (-59)  (4+)       70   550
>    25 (27-65)   (65-) (0-1)     1650 10300
>    26 (27-65)   (65-) (2-3)     1450  9900
>    27 (27-65)   (65-)  (4+)     3400 28900
>    28 (27-65) (63-64) (0-1)     1550  9900
>    29 (27-65) (63-64) (2-3)     1450 10000
>    30 (27-65) (63-64)  (4+)     3200 27700
>    31 (27-65) (60-62) (0-1)     1250  9300
>    32 (27-65) (60-62) (2-3)     1250  9200
>    33 (27-65) (60-62)  (4+)     2500 25600
>    34 (27-65)   (-59) (0-1)      500  4700
>    35 (27-65)   (-59) (2-3)      550  5300
>    36 (27-65)   (-59)  (4+)     1400 18100
>    37 (66-80)   (65-) (0-1)       55   275
>    38 (66-80)   (65-) (2-3)       40   250
>    39 (66-80)   (65-)  (4+)      180  1400
>    40 (66-80) (63-64) (0-1)       35   225
>    41 (66-80) (63-64) (2-3)       30   225
>    42 (66-80) (63-64)  (4+)      155  1450
>    43 (66-80) (60-62) (0-1)       25   200
>    44 (66-80) (60-62) (2-3)       40   300
>    45 (66-80) (60-62)  (4+)      130  1500
>    46 (66-80)   (-59) (0-1)       25   175
>    47 (66-80)   (-59) (2-3)       30   300
>    48 (66-80)   (-59)  (4+)      180  2400
>
>    claims.r =the number of claims made in a particular group.
>    exp.n=the total number of policyholders in a particular group.
>
>    EXAMPLE
>
>    As an example if we use age as an explanatory variable and fit the glm
>    model we get the following results:
>
>    cars<-read.table("c:/a.dat",header=T)
>    attach(cars)
>    y<-cbind(claims.r,exp.n)
>    cars.age<-glm(y~age,family=binomial)
>    summary(cars.age)
>
>    Call:
>    glm(formula = y ~ age, family = binomial)
>
>    Deviance Residuals:
>          Min         1Q     Median         3Q        Max
>    -16.62952   -1.84073   -0.04282    2.07028   10.72502
>
>    Coefficients:
>                Estimate Std. Error z value Pr(>|z|)
>    (Intercept) -1.46265    0.02050  -71.34   <2e-16 ***
>    age(23-26)  -0.34576    0.03197  -10.82   <2e-16 ***
>    age(27-65)  -0.66345    0.02182  -30.41   <2e-16 ***
>    age(66-80)  -0.77863    0.04020  -19.37   <2e-16 ***
>    ---
>    Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
>    (Dispersion parameter for binomial family taken to be 1)
>
>        Null deviance: 1837.4  on 47  degrees of freedom
>    Residual deviance:  883.8  on 44  degrees of freedom
>    AIC: 1228.4
>
>    Number of Fisher Scoring iterations: 4
>
>    The Genstat output is displayed below.  (well only some of it!!!)
>
>
>    ***** Regression Analysis *****
>
>    Response variate: claims_r
>    Binomial totals: exp_n
>    Distribution: Binomial
>    Link function: Logit
>    Fitted terms: Constant, age
>
>
>    *** Summary of analysis ***
>
>                                            mean  deviance approx
>                  d.f.     deviance     deviance     ratio chi pr
>    Regression       3        1298.       432.70    432.70  <.001
>    Residual        44        1136.        25.83
>    Total           47        2434.        51.80
>    * MESSAGE: ratios are based on dispersion parameter with value 1
>
>    Dispersion parameter is fixed at 1.00
>
>
>    *** Estimates of parameters ***
>                                                             antilog of
>                      estimate         s.e.      t(*)  t pr.   estimate
>    Constant           -1.1992       0.0211    -56.90  <.001     0.3014
>    age (23-26)        -0.4302       0.0326    -13.20  <.001     0.6504
>    age (27-65)        -0.7999       0.0224    -35.75  <.001     0.4494
>    age (66-80)        -0.9297       0.0407    -22.86  <.001     0.3947
>    * MESSAGE: s.e.s are based on dispersion parameter with value 1
>
>    Parameters for factors are differences compared with the reference
>    level:
>                  Factor  Reference level
>                     age  (17-22)
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list