[R] optim() for ordered logit model with parallel regression assumption

Wed Aug 1 23:44:19 CEST 2012

Disclaimer: I have not followed this thread at all, but only wish to note:

1) Indicator variables are (almost?) never needed in R -- that you are
fooling with them suggests that there is probably a better approach.

2) Your bols is just least regression, no? -- If so, there are far
better ways to do this in R.

3) ?model.frame and ?model.matrix may be relevant to what you're trying to do.

While you may get the help you need on this list (though clearly not
from me!), I suspect that you would benefit from consulting with a
local statistician/R expert who would help you with a major rewrite of
your code.

But ... note my disclaimer again at the top.

Cheers,
Bert

On Wed, Aug 1, 2012 at 2:34 PM, Xu Jun <junxu.r at gmail.com> wrote:
> Thanks Michael. Now I switched my approach after doing some google.
> Following are my new codes:
>
> ###################################################
>  library(foreign)
>  readin <- read.dta("ordfile.dta", convert.factors=FALSE)
>  myvars <- c("depvar", "x1", "x2", "x3")
>  mydta <- readin[myvars]
>  # remove all missings
> mydta <- na.omit(mydta)
>
> ologit.lf <- function(theta, y, X) {
>   X <- as.matrix(X)
>   y <- as.matrix(y)
>   n <- nrow(X)
>   k <- ncol(X)
>   b <- theta[1:k]
>   tau1 <- as.numeric(theta [k+1])
>   tau2 <- as.numeric(theta [k+2])
>   tau3 <- as.numeric(theta [k+3])
>
>   p1 <- (1/(1+exp( - tau1 + X %*% b)))
>   p2 <- (1/(1+exp( - tau2 + X %*% b))) - (1/(1+exp( - tau1 + X %*% b)))
>   p3 <- (1/(1+exp( - tau3 + X %*% b))) - (1/(1+exp( - tau2 + X %*% b)))
>   p4 <- 1 - (1/(1+exp( - tau3 + X %*% b)))
>
>   # create indicator variables
>   i  <- rep(1, nrow(X))
>   i1 <- as.numeric(  i == y)
>   i2 <- as.numeric(2*i == y)
>   i3 <- as.numeric(3*i == y)
>   i4 <- as.numeric(4*i == y)
>
>   llik <- sum(i1*log(p1) + i2*log(p2) + i3*log(p3) + i4*log(p4))
>   return(-llik)
> }
>
> y <- (mydta$depvar)
> X <- cbind(mydta$x1, mydta$x2, mydta$x3)
>
> if (FALSE) { # providing start values in optim
>   xint <-  cbind(rep(1,nrow(X)),X)
>   bols <- (solve(t(xint) %*% xint)) %*% (t(xint) %*% y)
>   startval <- rbind(as.matrix(bols[2:nrow(bols)]),0,0,0)
> }
>
> runopt <- optim(rep(0, ncol(X)+3), ologit.lf, method="BFGS",
> hessian=T, y=y, X=X)
> #########################################################################
>
> But then I got the following error message;
> Error in optim(rep(0, ncol(X) + 3), ologit.lf, method = "BFGS", hessian = T,  :
>   initial value in 'vmmin' is not finite
>
> I know there is something wrong with the way that I set up my initial
> values, but just couldn't figure out how. Any help would be greatly
> appreciated!
>
> Jun
>
> On Tue, Jul 31, 2012 at 10:07 PM, R. Michael Weylandt
> <michael.weylandt at gmail.com> wrote:
>> On Tue, Jul 31, 2012 at 7:57 PM, Xu Jun <junxu.r at gmail.com> wrote:
>>> Dear R listers,
>>>
>>> I am learning the MLE utility optim() in R to program ordered logit
>>> models just as an exercise. See below I have three independent
>>> variables, x1, x2, and x3. Y is coded as ordinal from 1 to 4. Y is not
>>> yet a factor variable here. The ordered logit model satisfies the
>>> parallel regression assumption. The following codes can run through,
>>> but results were totally different from what I got using the polr
>>> function from the MASS package. I think it might be due to the way how
>>> the p is constructed in the ologit.lf function. I am relatively new to
>>> R, and here I would guess probably something related to the class type
>>> (numeric vs. matrix) or something along that line among those if
>>> conditions. Thanks in advance for any suggestion.
>>>
>>> Jun Xu, PhD
>>> Assistant Professor
>>> Department of Sociology
>>> Ball State University
>>> Muncie, IN 47306
>>>
>>>
>>> ####################################################################
>>>
>>> library(foreign)
>>> readin <- read.dta("ordfile.dta", convert.factors=FALSE)
>>> myvars <- c("depvar", "x1", "x2", "x3")
>>> mydta <- readin[myvars]
>>> # remove all missings
>>> mydta <- na.omit(mydta)
>>>
>>> # theta is the parameter vector
>>> ologit.lf <- function(theta, y, X) {
>>>   n <- nrow(X)
>>>   k <- ncol(X)
>>> # b is the coefficient vector for independent variables
>>>   b <- theta[1:k]
>>> # tau1 is cut-point 1
>>>   tau1 <- theta [k+1]
>>> # tau2 is cut-point 2
>>>   tau2 <- theta [k+2]
>>> # tau3 is cut-point 1
>>>   tau3 <- theta [k+3]
>>>
>>>   if (y == 1){
>>>     p <- (1/(1+exp( - tau1 + X %*% b)))
>>>   }
>>>   if (y == 2) {
>>>     p <- (1/(1+exp( - tau2 + X %*% b))) - (1/(1+exp( - tau1 + X %*% b)))
>>>   }
>>>   if (y == 3) {
>>>     p <- (1/(1+exp( - tau3 + X %*% b))) - (1/(1+exp( - tau2 + X %*% b)))
>>>   }
>>>   if (y == 4) {
>>>     p <- 1 - (1/(1+exp( - tau3 + X %*% b)))
>>>   }
>>>   - sum(p)
>>> }
>>>
>>> y <- as.numeric(mydta$depvar)
>>> X <- cbind (mydta$x1, mydta$x2, mydta$x3)
>>> runopt <- optim(rep(0, ncol(X)+4), ologit.lf, method="BFGS",
>>> hessian=T, y=y, X=X)
>>>
>>>
>>> There were 50 or more warnings (use warnings() to see the first 50)
>>>> warnings()
>>>
>>> 1: In if (y == 1) { ... :
>>>   the condition has length > 1 and only the first element will be used
>>> 2: In if (y == 2) { ... :
>>>   the condition has length > 1 and only the first element will be used
>>
>> It looks like you've got a fundamental problem in your if/else
>> statements. if and else are control structures and so they operate on
>> the whole program flow -- I think you want the ifelse() function here
>> instead.
>>
>> Take a look at this example:
>>
>> x <- c(1, 5, 9)
>>
>> if(x < 3) {y <- x^2} else {y <- 2}
>>
>> z <- ifelse(x < 3, x^2, 2)
>>
>> print(x)
>> print(y)
>> print(z)
>>
>> See also ?ifelse
>>
>> Best,
>> Michael
>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm