[R] logit and polytomous data

Fri Mar 10 13:15:42 CET 2000

> Date: Fri, 10 Mar 2000 12:18:14 +0200 (EET)
> From: Kari Ruohonen <toimelat at saunalahti.fi>
> 
> I am new to generalized linear models and studying 
> McCullagh & Nelder (1989). Especially, I have a problem 
> resembling the \"cheese taste\" example (5.3.1. p. 109) of 
> the book. 

You confused me. It is on page 109 of McCullagh & Nelder (1983), the
first edition. That example is not using a glm: see later.

> I tried to analyse the cheese example with R but 
> failed to do so because R allowed me to use logit link 
> function only with binary family that supposes 0 <= y <= 1. 
> Do I need to scale the y\'s or is there another way?

You want the *binomial* family.  From Venables & Ripley
(1999, pp. 218-9)

1. If the response is a numeric vector it is assumed to hold
  the data in ratio form, $y_i = s_i/a_i$, in which case the
  $a_i$s must be given as a vector of weights using the
  \sfn{weights} argument.  (If the $a_i$ are all one the default
  \sfn{weights} suffices.)

2. If the response is a logical vector or a two-level factor
  it is treated as a 0/1 numeric vector and handled as previously.

  If the response is a multi-level factor, the first level is treated
  as 0 (failure) and all others as 1 (success).

3. If the response is a two-column matrix it is assumed that the
  first column holds the number of successes, $s_i$, and the second
  holds the number of failures, $a_i - s_i$, for each trial.  In
  this case no \sfn{weights} argument is required.

The less-intuitive third form allows the fitting function to select a
better starting value, so we tend to favour it.

This is using the standard notation for a GLM family: for a binomial
s_i is the number of successes out of a_i trial.

HOWEVER, the analysis in McCullagh and Nelder (1983) is via an ordinal
logistic regression.

Here's an R analysis:

library(MASS)
counts <- scan()
0  0  1  7  8  8 19  8  1
6  9 12 11  7  6  1  0  0
1  1  6  8 23  7  5  1  0
0  0  0  1  3  7 14 16 11

resp <- ordered(gl(9, 1, 36))
cheese <- gl(4, 9, 36, labels=LETTERS[1:4])
cheese <- relevel(cheese, "D")

fit <- polr(resp ~ cheese, weights=counts, Hess=TRUE)
fit

Call:
polr(formula = resp ~ cheese, weights = counts, Hess=TRUE)

Coefficients:
  cheeseA   cheeseB   cheeseC 
-1.612792 -4.964633 -3.322695 

Intercepts:
       1|2        2|3        3|4        4|5        5|6        6|7        7|8 
-7.0801140 -6.0249573 -4.9253989 -3.8568048 -2.5205479 -1.5685582 -0.0669085 
       8|9 
 1.4929198 

Residual Deviance: 711.35 
AIC: 733.35 

summary(fit)
...
Coefficients:
            Value Std. Error    t value
cheeseA -1.612792  0.3805424  -4.238139
cheeseB -4.964633  0.4767185 -10.414182
cheeseC -3.322695  0.4218289  -7.876879
...
vcov(fit)
...

(Summary on polr should be able to give the correlations, but the presnet
version has a small bug.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._