[R] polr in MASS
array chip
arrayprofile at yahoo.com
Mon May 5 20:53:30 CEST 2003
--- Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> On Mon, 5 May 2003, array chip wrote:
>
> > I misunderstand something here? Or the the way I
> used to reproduce the
> > predicted probabilities is not correct? Thanks
>
> Yes:
>
> 1) How to wrap lines in emails
>
> 2) How to post plain text to mailing lists.
>
> 3) What the model actually is.
>
> --
> Brian D. Ripley,
Sorry that I didn't realize the formatting problem.
Hopefully this time it works:
Hi, I am trying to use the proportional-odds model
using the "polr" function in the MASS library. I tried
with the dataset of "housing" contained in the MASS
book ("Sat" (factor: low, medium, high) is the
dependent variable, "Infl" (low, medium, high), "Type"
(tower, apartment, atrium, terrace) and "Cont" (low,
high) are the predictor variables (factors). And I
have some questions, hope someone could help me out.
The following commands are from the MASS book as well.
>
house.plr<-polr(Sat~Infl+Type+Cont,data=housing,weights=Freq)
> summary(house.plr)
Call: polr(formula = Sat ~ Infl + Type + Cont, data =
housing, weights = Freq)
Coefficients:
Value Std. Error t value
InflMedium 0.5663922 0.10465276 5.412109
InflHigh 1.2888137 0.12715609 10.135682
TypeApartment -0.5723552 0.11923800 -4.800107
TypeAtrium -0.3661907 0.15517331 -2.359882
TypeTerrace -1.0910073 0.15148595 -7.202036
ContHigh 0.3602803 0.09553577 3.771156
Intercepts:
Value Std. Error t value
Low|Medium -0.4961 0.1248 -3.9740
Medium|High 0.6907 0.1255 5.5049
Residual Deviance: 3479.149
AIC: 3495.149
I also tried to predict the probabilities of the 3
categories of "Sat" using the predict function:
> hnames<-lapply(housing[,-5],levels)
>
house.pr1<-predict(house.plr,expand.grid(hnames[-1]),type="probs")>
cbind(expand.grid(hnames[-1]),round(house.pr1,2))
Infl Type Cont Low Medium High
1 Low Tower Low 0.38 0.29 0.33
2 Medium Tower Low 0.26 0.27 0.47
3 High Tower Low 0.14 0.21 0.65
4 Low Apartment Low 0.52 0.26 0.22
5 Medium Apartment Low 0.38 0.29 0.33
6 High Apartment Low 0.23 0.26 0.51
7 Low Atrium Low 0.47 0.27 0.26
8 Medium Atrium Low 0.33 0.29 0.38
9 High Atrium Low 0.19 0.25 0.56
10 Low Terrace Low 0.64 0.21 0.14
11 Medium Terrace Low 0.51 0.26 0.23
12 High Terrace Low 0.33 0.29 0.38
13 Low Tower High 0.30 0.28 0.42
14 Medium Tower High 0.19 0.25 0.56
15 High Tower High 0.10 0.17 0.72
16 Low Apartment High 0.43 0.28 0.29
17 Medium Apartment High 0.30 0.28 0.42
18 High Apartment High 0.17 0.23 0.60
19 Low Atrium High 0.38 0.29 0.33
20 Medium Atrium High 0.26 0.27 0.47
21 High Atrium High 0.14 0.21 0.64
22 Low Terrace High 0.56 0.25 0.19
23 Medium Terrace High 0.42 0.28 0.30
24 High Terrace High 0.26 0.27 0.47
what I am confused is that when I tried to reproduce
these predicted probabilities using the model
coefficients, I can sometimes get different results:
for example, for low Infl, Type tower, cont low,
logit(P(low))=P(low)/(1-P(low))=-0.4961, solve for
P(low)=exp(-0.4961)/(1+exp(-0.4961))=0.38;
and
logit(P(low,
medium))=P(low,medium)/(1-P(low,medium))=0.6907, solve
for P(low,medium)=exp(0.6907)/(1+exp(0.6907))=0.67,
which is the sum of 0.38 plus 0.29.
The above 2 examples showed that I can reproduce the
predicted probabilities when using the intercept
alone. However, for other combinations of the
predictor variables, I can NOT repreduce the results.
For example, for medium Infl, Type tower, cont low,
logit(P(low))=P(low)/(1-P(low))=-0.4961+0.56639=0.07029,
solve for P(low)=exp(0.07029)/(1+exp(0.07029))=0.52;
but the probability using "predict" function is 0.26.
and
logit(P(low,
medium))=P(low,medium)/(1-P(low,medium))=0.6907+0.56639=1.25709,
solve for
P(low,medium)=exp(1.25709)/(1+exp(1.25709))=0.78,
which certainly is NOT the sum of 0.26 plus 0.27, and
is not the sum of 0.52 and 0.27.
Did I misunderstand something here? Or the the way I
used to reproduce the predicted probabilities is not
correct?
Thanks very much!
More information about the R-help
mailing list