[R] Probit predictions outside (0,1) interval

John Fox jfox at mcmaster.ca
Fri Mar 5 15:39:24 CET 2004


Dear Arnab,

Several people have already noted that you're getting predicted values on
the wrong scale. Note, as well, that you fit a logit model rather than a
probit model; for a probit model, you need family=binomial(probit), since
the logit link is the canonical link for the binomial family.

John

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Arnab mukherji
> Sent: Friday, March 05, 2004 2:48 AM
> To: r-help at stat.math.ethz.ch
> Cc: r-help at stat.math.ethz.ch
> Subject: [R] Probit predictions outside (0,1) interval
> 
> Hi!
> 
> I was trying to implement a probit model on a dichotomous 
> outcome variable and found that the predictions were outside 
> the (0,1) interval that one should get. I later tried it with 
> some simulated data with a similar result. 
> 
> Here is a toy program I wrote and I cant figure why I should 
> be getting such odd predictions.
> 
> x1<-rnorm(1000)
> x2<-rnorm(1000)
> x3<-rnorm(1000)
> x4<-rnorm(1000)
> x5<-rnorm(1000)
> x6<-rnorm(1000)
> e1<-rnorm(1000)/3
> e2<-rnorm(1000)/3
> e3<-rnorm(1000)/3
> y<-1-(1-pnorm(-2+0.33*x1+0.66*x2+1*x3+e1)*1-(pnorm(1+1.5*x4-0.
> 25*x5+e2)*pnorm(1+0.2*x6+e3)))
> y <- y>runif(1000)
> dat<-data.frame(y = y, x1 = x1, x2 = x2, x3 = x3) g<-glm(y~., 
> data = dat, family = binomial)
> summary(g)
> yhat<-predict(g, dat)
> 
> 
> Call:
> glm(formula = y ~ ., family = binomial, data = dat)
> 
> Deviance Residuals: 
>     Min       1Q   Median       3Q      Max  
> -1.8383  -1.3519   0.7638   0.9249   1.3698  
> 
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)    
> (Intercept)  0.71749    0.06901  10.397  < 2e-16 ***
> x1           0.10211    0.07057   1.447  0.14791    
> x2           0.21068    0.07177   2.936  0.00333 ** 
> x3           0.35162    0.07070   4.974 6.57e-07 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 1275.3  on 999  degrees of freedom 
> Residual deviance: 1239.4  on 996  degrees of freedom
> AIC: 1247.4
> 
> Number of Fisher Scoring iterations: 4
> 
> > yhat<-predict(g, dat)
> > 
> > range(yhat)
> [1] -0.4416826  2.0056527
> > range(y)
> [1] 0 1
> 
> Any advice would be really helpful.
>




More information about the R-help mailing list