[RsR] Questions about interpreting lmRob output

Wed Nov 14 18:10:21 CET 2007

Good, I understand that answer. What I still don't understand is whether the diagnostic plot that robust calls is plotting ALL the data, or just the good part. For example, when I look at an overlaid Q-Q plot of my data,  there is not too much difference between the robust and ls lines. They are both linear in the  middle, but form an S-curve on the ends (maybe 7 points out of 37 are in the ends). So if I have a breakdown point of around .5, and then I look at my Q-Q plot and see that the middle part looks linear, can I rest easy? Or is plot.lmRob plotting ONLY the good part of the data, so now I should be concerned because I don't have a normally distributed Q-Q plot? (by the way, this data was log transformed already to try to help with the heteroskedasticity).

>>> Kjell Konis <konis using stats.ox.ac.uk> 11/14/07 10:08 AM >>>
The basic idea underlying the robust linear model is that some  
fraction (1-alpha > 0.5) of the data is distributed conditionally  
normal and the remaining fraction (alpha) comes from some arbitrary  
distribution (i.e., the outliers).  The goal of a robust method is to  
estimate the parameters (beta and sigma^2) of this conditional normal  
distribution without giving the outliers too much influence.  If the  
bulk of the data (aka the good data) is not distributed conditionally  
normal then a linear model is not appropriate regardless of whether it  
is fit robustly or not.  Of course you can still use all of the  
standard linear modeling tricks.  For instance a log transformation of  
the response sometimes helps with heteroskedasticity.

Kjell

On 14 Nov 2007, at 15:24, Jenifer Larson-Hall wrote:

> Thanks so much Kjell. Your response answers most of my questions.  
> Actually, I figured the overlaid plots things out (and the cool  
> fit.models function) by looking through the archives and finding  
> your pdf presentation that showed it (www.stats.ox.ac.uk/~konis/robust/ROBCLA2006-konis.pdf) 
> . That was very helpful!
>
> The documentation you sent me privately (Robust.pdf, documentation  
> for S-PLUS library) was helpful in clearing up a few more lingering  
> questions (I guess if others want it they can email you).

The Robust Library Users Guide (Robust.pdf) is included in the source  
version of the Robust Library.

> Just one more question now:
>
> My sense of robust methods was that they returned values which did  
> not make strict normality and homogeneity of variances assumptions.  
> In the data set I gave in my previous email, there is  
> heteroskedasticity and non-normality distribution of data. So from  
> what I understand from my reading, robust methods will give me a  
> better sense of what's going on in the bulk of my data than least- 
> squares estimates. If this is true, then what is the reason for  
> looking at diagnostic plots? If I find the data is still  
> heteroskedastic and non-normal in the plots after the robust  
> analysis, is this cause for worry?
>
>
> Dr. Jenifer Larson-Hall
> Assistant Professor of Linguistics
> University of North Texas
> (940)369-8950
>
>
>