[R] relative risk regression with survey data

Wed Sep 15 15:37:59 CEST 2010

Dear Thomas,

You said, "the log-binomial model is very non-robust when the fitted values
get close to 1, and there is some controversy over the best approach."
Could you please point me to a paper that discusses the issues?

I have written some code to do maximum likelihood estimation for relative,
additive, and mixed risk regression models with binomial model.  I have been
able to obtain good convergence.  I have used bootstrap to get standard
errors.  However, I am not sure if these standard errors are valid when
fitted values were close to 0 or 1. It seems to me that when the fitted
probabilities are close to 0 or 1, there is not a good way to estimate
standard errors.

Thanks,
Ravi.

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Thomas Lumley
Sent: Monday, September 13, 2010 10:41 PM
To: Daniel Nordlund
Cc: r-help at r-project.org
Subject: Re: [R] relative risk regression with survey data

On Mon, 13 Sep 2010, Daniel Nordlund wrote:

> I have been asked to look at options for doing relative risk regression on

> some survey data.  I have a binary DV and several predictor / adjustment 
> variables.  In R, would this be as "simple" as using the survey package to

> set up an appropriate design object and then running svyglm with 
> family=binomial(log) ?  Any other suggestions for covariate adjustment of 
> relative risk estimates?  Any and all suggestions welcomed.

If the fitted values don't get too close to 1 then svyglm(
,family=quasibinomial(log)) will do it.

The log-binomial model is very non-robust when the fitted values get close
to 1, and there is some controversy over the best approach.  You can still
use svyglm(  ,family=quasibinomial(log)) but you will probably need to set
the number of iterations much higher (perhaps 200).

Alternatively, you can use nonlinear least squares  [svyglm(,
family=gaussian(log))] or other quasilikelihood approaches, such as
family=quasipoisson(log).  These are all consistent for the same parameter
if the model is correctly specified and are much more robust to x-outliers.
I rather like nonlinear least squares, because it's easy to explain.

      -thomas

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.