[R] Propensity score modeling using machine learning methods. WAS: RE: LARS for generalized linear models

Ridgeway, Greg gregr at rand.org
Mon Sep 18 22:04:55 CEST 2006

There may be benefits to having a machine learning method that
explicitly targets covariate balance. We have experimented with
optimizing the weights directly to obtain the best covariate balance,
but got some strange solutions for simple cases that made us wary of
such methods.

Machine learning methods that yield calibrated probability estimates
should do well (e.g. those that optimize the logistic log-likelihood).
Methods that only seek a decision boundary (SVM comes to mind) can be
give great classifiers but offer poor probability estimates and then the
propensity score weights are a mess. We've had a lot of success in
practice using gbm and selecting the number of iterations to optimize
balance. You can try the ps() function in the twang package which wraps
up gbm and balance optimization in a single function. It's slow for
large datasets but it gets the job done.

Including additional variables in a weighted regression is a great
protective step. It can reduce both bias and variance and can produce
"doubly robust" estimates of the treatment effect (see Bang & Robins
2005 for an example).


-----Original Message-----
From: Ravi Varadhan [mailto:rvaradhan at jhmi.edu] 
Sent: Monday, September 18, 2006 12:38 PM
To: Ridgeway, Greg; r-help at stat.math.ethz.ch
Subject: Propensity score modeling using machine learning methods. WAS:
RE: [R] LARS for generalized linear models

Thanks very much, Greg.  I will certainly look at glmpath.

My goal is to develop (nearly) automatic and flexible procedures for
estimating causal effects of risk factors in observational
studies.  A major part of this is the development of a propensity score
model (when the exposure is binary).  I would like to use
that can do this semi-automatically so that the resulting model has both
prediction error and good covariate balance.

I have read your paper (McCaffrey, Ridgeway and Morral 2004), which uses
gradient boosting machine (gbm) to build a logistic regression model for
propensity score.  I was wondering whether there are other tools that
also address this problem, for example, glmpath or MARS? 

An important question is whether these "machine learning" methods,
focused on a good prediction rule, can also achieve a good covariate
between the treatment groups, since "balance" is not explicitly built
the cost function.  If there is significant imbalance, incorporating
covariates into the regression model for outcomes, and performing a
least squares analysis (with estimated propensity score as weights)
be reasonable.  Am I right?  

I would appreciate comments on these points.

Thanks very much.



Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan at jhmi.edu




-----Original Message-----
From: Ridgeway, Greg [mailto:gregr at rand.org] 
Sent: Monday, September 18, 2006 2:17 PM
To: r-help at stat.math.ethz.ch
Cc: Ravi Varadhan
Subject: Re: [R] LARS for generalized linear models

Check out Park & Hastie's glmpath package. They have a really clever
analysis and implementation of a generalized least angle regression.

>On Fri, 2006-09-15 at 18:49 -0400, Ravi Varadhan wrote:
> > Is there an R implementation of least angle regression for binary
> > modeling?  I know that this question has been asked before, and I am
> > aware of the "lasso2" package, but that only implements an L1
penalty, i.e.
> > the Lasso approach.
> > Madigan and Ridgeway in their discussion of Efron et al (2004)
describe a
> > LARS-type algorithm for generalized linear models.  Has anyone
> > this in R?


This email message is for the sole use of the intended recip...{{dropped}}

More information about the R-help mailing list