[R] Estimating regression with constraints in model coefficients

Gregg Powell g@@@powe|| @end|ng |rom protonm@||@com
Thu Apr 24 21:01:09 CEST 2025


Christofer,
That was a detailed follow-up — you clarified the requirements precisely enough providing a position to proceed from...


To implement this, a constrained optimization procedure to estimate an ordinal logistic regression model is needed (cumulative logit), based on:

1. Estimated Cutpoints
– No intercept in the linear predictor
– Cutpoints (thresholds) will be estimated directly from the data

2. Coefficient Constraints
– Box constraints on each coefficient
– For now:
lower = c(1, -1, 0)
upper = c(2, 1, 1)
– These apply to: pared, public, gpa (in that order)

3. Sum Constraint
– The sum of coefficients must equal 1.60

4. Data to use...
– Use the IDRE ologit.dta dataset from UCLA (for now)


Technical Approach

• Attempt to write a custom negative log-likelihood function using the cumulative logit formulation:

P(Y≤k∣X)=11+exp⁡[−(θk−Xβ)]P(Y \leq k \mid X) = \frac{1}{1 + \exp[-(\theta_k - X\beta)]} 


and derive P(Y=k)P(Y = k) from adjacent differences of these.

• Cutpoints θk\theta_k will be estimated as separate parameters, with constraints to ensure they’re strictly increasing for identifiability.

• The optimization will use nloptr::nloptr(), which supports:
- Lower/upper bounds (box constraints)
- Equality constraints (for sum of β)
- Nonlinear inequality constraints (to keep cutpoints ordered)


I’ll have some time later - in the next day or two to attempt a script with:
- Custom negative log-likelihood
- Parameter vector split into β and cutpoints
- Constraint functions: sum(β) = 1.60 and increasing cutpoints
- Optimization via nloptr()

Hopefully, you’ll be able to run it locally with only the VGAM, foreign, and nloptr packages.

I’ll send the .R file along with the next email. A best attempt, anyway.


r/
Gregg


“Oh, what fun it is!”
—Alice, Alice’s Adventures in Wonderland by Lewis Carroll





On Thursday, April 24th, 2025 at 1:56 AM, Christofer Bogaso <bogaso.christofer using gmail.com> wrote:

> 

> 

> Hi Gregg,
> 

> Many thanks for your detail feedback, those are really super helpful.
> 

> I have ordered a copy of Agresti from our local library, however it
> appears that I would need to wait for a few days.
> 

> Regrading my queries, it would be super helpful if we can build a
> optimization algo based on below criterias:
> 

> 1. Whether you want the cutpoints fixed (and to what values), or if
> you want them estimated separately (with identifiability managed some
> other way); I would like to have cut-points directly estimated from
> the data
> 2. What your bounds on the coefficients are (lower/upper vectors), For
> this question, I am having different bounds for each of the
> coefficients. Therefore I would have a vector of lower points and
> other vector of upper points. However to start with let consider lower
> and upper bounds as lower = c(1, -1, 0) and upper = c(2, 1, 1) for the
> variables pared, public, and gpa. In my model, there would not be any
> Intercept, hence no question on its bounds
> 3. What value the sum of coefficients should equal (e.g., sum(β) = 1,
> or something else); Let the sum be 1.60
> 4. And whether you're working with the IDRE example data, or something
> else. My original data is actually residing in a computer without any
> access to the internet (for data security.) However we can start with
> any suitable data for this experiment, so one such data may be
> https://stats.idre.ucla.edu/stat/data/ologit.dta. However I am still
> exploring if there is any possibility to extract a randomized copy of
> that actual data, if I can - I will share once available
> 

> Again, many thanks for your time and insights.
> 

> Thanks and regards,
> 

> On Wed, Apr 23, 2025 at 9:54 PM Gregg Powell g.a.powell using protonmail.com wrote:
> 

> > Hello again Christofer,
> > Thanks for your thoughtful note — I’m glad the outline was helpful. Apologies for the long delay getting back to you. Had to do a small bit of research…
> > 

> > Recommended Text on Ordinal Log-Likelihoods:
> > You're right that most online sources jump straight to code or canned functions. For a solid theoretical treatment of ordinal models and their likelihoods, consider:
> > "Categorical Data Analysis" by Alan Agresti
> > – Especially Chapters 7 and 8 on ordinal logistic regression.
> > – Covers proportional odds models, cumulative logits, adjacent-category logits, and the derivation of likelihood functions.
> > – Provides not only equations but also intuition behind the model structure.
> > It’s a standard reference in the field and explains how to build log-likelihoods from first principles — including how the cumulative probabilities arise and why the difference of CDFs represents a category-specific probability.
> > Also helpful:
> > "An Introduction to Categorical Data Analysis" by Agresti (2nd ed) – A bit more accessible, and still covers the basics of ordinal models.
> > ________________________________________
> > 

> > If You Want to Proceed Practically:
> > In parallel with theory, if you'd like a working R example coded from scratch — with:
> > • a custom likelihood for an ordinal (cumulative logit) model,
> > • fixed thresholds / no intercept,
> > • coefficient bounds,
> > • and a sum constraint on β
> > 

> > I’d be happy to attempt that using nloptr() or constrOptim(). You’d be able to walk through it and tweak it as necessary (no guarantee that I will get it right 😊)
> > 

> > Just let me know:
> > 1. Whether you want the cutpoints fixed (and to what values), or if you want them estimated separately (with identifiability managed some other way);
> > 2. What your bounds on the coefficients are (lower/upper vectors),
> > 3. What value the sum of coefficients should equal (e.g., sum(β) = 1, or something else);
> > 4. And whether you're working with the IDRE example data, or something else.
> > 

> > I can use the UCLA ologit.dta dataset as a basis if that's easiest to demo on, or if you have another dataset you’d prefer – again, let me know.
> > 

> > All the best!
> > 

> > gregg
> > 

> > On Monday, April 21st, 2025 at 11:25 AM, Christofer Bogaso bogaso.christofer using gmail.com wrote:
> > 

> > > Hi Gregg,
> > 

> > > I am sincerely thankful for this workout.
> > 

> > > Could you please suggest any text book on how to create log-likelihood
> > > for an ordinal model like this? Most of my online search point me
> > > directly to some R function etc, but a theoretical discussion on this
> > > subject would be really helpful to construct the same.
> > 

> > > Thanks and regards,
> > 

> > > On Mon, Apr 21, 2025 at 9:55 PM Gregg Powell g.a.powell using protonmail.com wrote:
> > 

> > > > Christofer,
> > 

> > > > Given the constraints you mentioned—bounded parameters, no intercept, and a sum constraint—you're outside the capabilities of most off-the-shelf ordinal logistic regression functions in R like vglm or polr.
> > 

> > > > The most flexible recommendation at this point is to implement custom likelihood optimization using constrOptim() or nloptr::nloptr() with a manually coded log-likelihood function for the cumulative logit model.
> > 

> > > > Since you need:
> > 

> > > > Coefficient bounds (e.g., lb ≤ β ≤ ub),
> > 

> > > > No intercept,
> > 

> > > > And a constraint like sum(β) = C,
> > 

> > > > …you'll need to code your own objective function. Here's something of a high-level outline of the approach:
> > 

> > > > A. Model Setup
> > > > Let your design matrix X be n x p, and let Y take ordered values 1, 2, ..., K.
> > 

> > > > B. Parameters
> > > > Assume the thresholds (θ_k) are fixed (or removed entirely), and you’re estimating only the coefficient vector β. Your constraints are:
> > 

> > > > Box constraints: lb ≤ β ≤ ub
> > 

> > > > Equality constraint: sum(β) = C
> > 

> > > > C. Likelihood
> > > > The probability of category k is given by:
> > 

> > > > P(Y = k) = logistic(θ_k - Xβ) - logistic(θ_{k-1} - Xβ)
> > 

> > > > Without intercepts, this becomes more like:
> > 

> > > > P(Y ≤ k) = 1 / (1 + exp(-(c_k - Xβ)))
> > 

> > > > …where c_k are fixed thresholds.
> > 

> > > > Implementation using nloptr
> > > > This example shows a rough sketch using nloptr, which allows both equality and bound constraints:
> > 

> > > > > library(nloptr)
> > 

> > > > > # Custom negative log-likelihood function
> > > > > negLL <- function(beta, X, y, K, cutpoints) {
> > > > > eta <- X %*% beta
> > > > > loglik <- 0
> > > > > for (k in 1:K) {
> > > > > pk <- plogis(cutpoints[k] - eta) - plogis(cutpoints[k - 1] - eta)
> > > > > loglik <- loglik + sum(log(pk[y == k]))
> > > > > }
> > > > > return(-loglik)
> > > > > }
> > 

> > > > > # Constraint: sum(beta) = C
> > > > > sum_constraint <- function(beta, C) {
> > > > > return(sum(beta) - C)
> > > > > }
> > 

> > > > > # Define objective and constraint wrapper
> > > > > objective <- function(beta) negLL(beta, X, y, K, cutpoints)
> > > > > eq_constraint <- function(beta) sum_constraint(beta, C = 2) # example >C
> > 

> > > > > # Run optimization
> > > > > res <- nloptr(
> > > > > x0 = rep(0, ncol(X)),
> > > > > eval_f = objective,
> > > > > lb = lower_bounds,
> > > > > ub = upper_bounds,
> > > > > eval_g_eq = eq_constraint,
> > > > > opts = list(algorithm = "NLOPT_LD_SLSQP", xtol_rel = 1e-8)
> > > > > )
> > 

> > > > The next step would be writing the actual log-likelihood for your specific problem or verifying constraint implementation.
> > 

> > > > Manually coding the log-likelihood for an ordinal model is nontrivial... so a bit of a challenge :)
> > 

> > > > All the best,
> > > > gregg powell
> > > > Sierra Vista, AZ
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 603 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20250424/eb63c724/attachment.sig>


More information about the R-help mailing list