[R] vglm(), t values and p values

Wed Nov 4 19:11:14 CET 2009

On Wed, 2009-11-04 at 14:00 +0000, Federico Calboli wrote:
> Hi All,

I can't answer your questions myself - try Thomas Yee, the author and
maintainer of the VGAM package - but:

> I'm fitting an proportional odds model using vglm() from VGAM.

Is there a particular reason for choosing a VGLM here? My reading of
your post suggests the response is an univariate, ordered factor and
VGLMs are especially for multivariate responses. In which case, can you
not use polr() in package MASS that comes with R or the lms() function
in the rms package (available from CRAN).

I haven't really used either of these functions in earnest, but one or
both may provide the p-values you desire, out of the box.

HTH

G

> 
> My response variable is the severity of diseases, going from 0 to 5 (the 
> severity is actually an ordered factor).
> 
> The independent variables are: 1 genetic marker, time of medical observation, 
> age, sex. What I *need* is a p-value for the genetic marker. Because I have ~1.5 
> million markers I'd rather not faffing around too much.
> 
> My model is:
> 
>  > mod.vglm = vglm(disease.status ~ x + time + age + sex, family = 
> cumulative(par = T))
> 
> where x is my genetic marker, coded as 0/1/2, time is days of medical observation.
> 
>  > summary(mod.vglm) works:
> 
> Call:
> vglm(formula = disease.status ~ x + time + age + sex, family = cumulative(par = T))
> 
> Pearson Residuals:
>                     Min       1Q   Median       3Q     Max
> logit(P[Y<=1]) -0.6642 -0.28704 -0.18329 -0.11681  3.8919
> logit(P[Y<=2]) -2.5580 -0.48080 -0.23315  0.47388  2.5983
> logit(P[Y<=3]) -2.1565 -0.56961  0.22089  0.44349 10.7964
> logit(P[Y<=4]) -3.3175  0.13064  0.20117  0.43176 12.5233
> 
> Coefficients:
>                      Value Std. Error  t value
> (Intercept):1 -2.4460e+00 4.2791e-01  -5.7162
> (Intercept):2 -7.1078e-01 4.1628e-01  -1.7074
> (Intercept):3  3.7619e-01 4.1545e-01   0.9055
> (Intercept):4  1.7467e+00 4.2092e-01   4.1496
> x              4.1421e-01 1.9762e-01   2.0959
> time          -3.6021e-04 3.0387e-05 -11.8540
> age           -2.6115e-05 9.2504e-06  -2.8232
> sexM           1.0188e-01 1.2491e-01   0.8156
> 
> Number of linear predictors:  4
> 
> Names of linear predictors:
> logit(P[Y<=1]), logit(P[Y<=2]), logit(P[Y<=3]), logit(P[Y<=4])
> 
> Dispersion Parameter for cumulative family:   1
> 
> Residual Deviance: 2475.937 on 3460 degrees of freedom
> 
> Log-likelihood: -1237.969 on 3460 degrees of freedom
> 
> #######################
> 
> So here are my questions:
> 
> 1) I need to get the t value for x, so I can use "1 - pt(tvalue,1)" to find some 
> sort of probability value for x. That's not trivial. Additionally, I assume df 
> for x is 1, hence I plan to use  "1 - pt(tvalue,1)", though I might well be 
> wrong. In any case getting the darned t value seems impossible
> 
> 2) because of the difficulty of getting (1), it there a way of getting vglm() to 
> spit out a p-value for x please?
> 
> I do recon many people might scoff at my crass desire for a p-value, but I'm 
> dealing with some dire phenotype in a whole genome analysis where the *only* 
> thing that matters are p-values. I have to be quite unsophysticated I'm afraid.
> 
> Best,
> 
> Federico
> 
> 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%