[R] Propensity score modeling using machine learning methods. WAS: RE: LARS for generalized linear models
Ridgeway, Greg
gregr at rand.org
Mon Sep 18 22:04:55 CEST 2006
There may be benefits to having a machine learning method that
explicitly targets covariate balance. We have experimented with
optimizing the weights directly to obtain the best covariate balance,
but got some strange solutions for simple cases that made us wary of
such methods.
Machine learning methods that yield calibrated probability estimates
should do well (e.g. those that optimize the logistic log-likelihood).
Methods that only seek a decision boundary (SVM comes to mind) can be
give great classifiers but offer poor probability estimates and then the
propensity score weights are a mess. We've had a lot of success in
practice using gbm and selecting the number of iterations to optimize
balance. You can try the ps() function in the twang package which wraps
up gbm and balance optimization in a single function. It's slow for
large datasets but it gets the job done.
Including additional variables in a weighted regression is a great
protective step. It can reduce both bias and variance and can produce
"doubly robust" estimates of the treatment effect (see Bang & Robins
2005 for an example).
Greg
-----Original Message-----
From: Ravi Varadhan [mailto:rvaradhan at jhmi.edu]
Sent: Monday, September 18, 2006 12:38 PM
To: Ridgeway, Greg; r-help at stat.math.ethz.ch
Subject: Propensity score modeling using machine learning methods. WAS:
RE: [R] LARS for generalized linear models
Thanks very much, Greg. I will certainly look at glmpath.
My goal is to develop (nearly) automatic and flexible procedures for
estimating causal effects of risk factors in observational
epidemiological
studies. A major part of this is the development of a propensity score
model (when the exposure is binary). I would like to use
tools/approaches
that can do this semi-automatically so that the resulting model has both
low
prediction error and good covariate balance.
I have read your paper (McCaffrey, Ridgeway and Morral 2004), which uses
a
gradient boosting machine (gbm) to build a logistic regression model for
propensity score. I was wondering whether there are other tools that
can
also address this problem, for example, glmpath or MARS?
An important question is whether these "machine learning" methods,
mainly
focused on a good prediction rule, can also achieve a good covariate
balance
between the treatment groups, since "balance" is not explicitly built
into
the cost function. If there is significant imbalance, incorporating
such
covariates into the regression model for outcomes, and performing a
weighted
least squares analysis (with estimated propensity score as weights)
should
be reasonable. Am I right?
I would appreciate comments on these points.
Thanks very much.
Best,
Ravi.
------------------------------------------------------------------------
----
-------
Ravi Varadhan, Ph.D.
Assistant Professor, The Center on Aging and Health
Division of Geriatric Medicine and Gerontology
Johns Hopkins University
Ph: (410) 502-2619
Fax: (410) 614-9625
Email: rvaradhan at jhmi.edu
Webpage:
http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
------------------------------------------------------------------------
----
--------
-----Original Message-----
From: Ridgeway, Greg [mailto:gregr at rand.org]
Sent: Monday, September 18, 2006 2:17 PM
To: r-help at stat.math.ethz.ch
Cc: Ravi Varadhan
Subject: Re: [R] LARS for generalized linear models
Check out Park & Hastie's glmpath package. They have a really clever
analysis and implementation of a generalized least angle regression.
Greg
>On Fri, 2006-09-15 at 18:49 -0400, Ravi Varadhan wrote:
> > Is there an R implementation of least angle regression for binary
response
> > modeling? I know that this question has been asked before, and I am
also
> > aware of the "lasso2" package, but that only implements an L1
penalty, i.e.
> > the Lasso approach.
>
> > Madigan and Ridgeway in their discussion of Efron et al (2004)
describe a
> > LARS-type algorithm for generalized linear models. Has anyone
implemented
> > this in R?
--------------------
This email message is for the sole use of the intended recip...{{dropped}}
More information about the R-help
mailing list