[R] Advice on picking a regression method

Liaw, Andy andy_liaw at merck.com
Wed Aug 11 16:49:43 CEST 2004


Just my $0.02...

Depending on what you are going to do with the model, heteroscedasticity
might be low on the list of things you should worry about.  I'd say that the
assumption that the model is a straight line might be high, if not the
highest, on that list.  That might be a reasonable assumption in your case,
but you definitely should investigate.

If straight line is a reasonable model for the data, then OLS may not be
such a bad thing, if you don't have skewed data or outliers.  You should try
several methods and see which looks most reasonable.  (I don't think there's
anything wrong with trying different methods of fitting the same model, at
least it seems less dangerous than choosing among many models fitted with
the same method.)

Non-constant variance only affects efficiency of the estimator and the
inference (CI, hythothesis tests).  If you need to do inference, you need to
address that, and two most popular ways are weighted least squares and
transformation.

HTH,
Andy

> From: Dewez Thomas
> 
> Dear R-users,
> 
> There are tons of methods out there for fitting independant 
> variables to a
> dependent variable. All stats books tell you about the 
> assumptions behind
> OLS (ordinary least squares) and warn against abusive use of 
> the method
> (which many of us do disregard by lack of a better knowledge). Most
> introductory text books stop there and don't tell you what 
> the next best
> option might be. I am aware that there might be many 
> depending on the type
> of study so here are the data to sort this question out.
> 
> In this instance, I am performing a regression on observations whose
> residuals show heteroscedasticity (the variance of residuals 
> is small for
> small dependant variable values and increases for larger ones), which
> violates one assumption of the OLS method. Which of the 
> numerous options
> should I choose? glm, robust lm, ...
> 
> The problem is kept simple for now. I only try to explain the 
> log of local
> topographic slope (dependent variable) with regard to the 
> distance to the
> outlet of a catchment (independent variable) for a fixed 
> drained area. Both
> variables are continuous.
> 
> I ordered Venables and Ripley 2002, which I suspect is a 
> sound reading for
> advanced stats with R, but it has not arrived yet and I need 
> to move on
> asap. Any advice or pointer to the appropriate literature is greatly
> appreciated.
> 
> Thomas
> 
> Dr Thomas Dewez
> ENTEC Post-Doctoral Fellow 
> ARN - MAS
> BRGM (French Geological Survey)
> 3 Av. C. Guillemin
> 45000 Orleans - France
> 
> Phone: +33 (0)2 38644606
> Fax: +33 (0)2 38643361
> ***
> Le contenu de cet e-mail et de ses pièces jointes est 
> destin...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list