[RsR] Questions about interpreting lmRob output

Kjell Konis kon|@ @end|ng |rom @t@t@@ox@@c@uk
Wed Nov 14 18:32:05 CET 2007


The plotting methods plot all of the data so, for instance, if  
outliers are present then the tails of the QQ-plots will not be  
straight.  If the middle half of the plot looks linear then the half  
of the data that fits the model the "best" has normally distributed  
residuals.

Kjell


On 14 Nov 2007, at 17:10, Jenifer Larson-Hall wrote:

> Good, I understand that answer. What I still don't understand is  
> whether the diagnostic plot that robust calls is plotting ALL the  
> data, or just the good part. For example, when I look at an overlaid  
> Q-Q plot of my data,  there is not too much difference between the  
> robust and ls lines. They are both linear in the  middle, but form  
> an S-curve on the ends (maybe 7 points out of 37 are in the ends).  
> So if I have a breakdown point of around .5, and then I look at my Q- 
> Q plot and see that the middle part looks linear, can I rest easy?  
> Or is plot.lmRob plotting ONLY the good part of the data, so now I  
> should be concerned because I don't have a normally distributed Q-Q  
> plot? (by the way, this data was log transformed already to try to  
> help with the heteroskedasticity).
>
>>>> Kjell Konis <konis using stats.ox.ac.uk> 11/14/07 10:08 AM >>>
> The basic idea underlying the robust linear model is that some
> fraction (1-alpha > 0.5) of the data is distributed conditionally
> normal and the remaining fraction (alpha) comes from some arbitrary
> distribution (i.e., the outliers).  The goal of a robust method is to
> estimate the parameters (beta and sigma^2) of this conditional normal
> distribution without giving the outliers too much influence.  If the
> bulk of the data (aka the good data) is not distributed conditionally
> normal then a linear model is not appropriate regardless of whether it
> is fit robustly or not.  Of course you can still use all of the
> standard linear modeling tricks.  For instance a log transformation of
> the response sometimes helps with heteroskedasticity.
>
> Kjell
>
> On 14 Nov 2007, at 15:24, Jenifer Larson-Hall wrote:
>
>> Thanks so much Kjell. Your response answers most of my questions.
>> Actually, I figured the overlaid plots things out (and the cool
>> fit.models function) by looking through the archives and finding
>> your pdf presentation that showed it (www.stats.ox.ac.uk/~konis/robust/ROBCLA2006-konis.pdf)
>> . That was very helpful!
>>
>> The documentation you sent me privately (Robust.pdf, documentation
>> for S-PLUS library) was helpful in clearing up a few more lingering
>> questions (I guess if others want it they can email you).
>
> The Robust Library Users Guide (Robust.pdf) is included in the source
> version of the Robust Library.
>
>> Just one more question now:
>>
>> My sense of robust methods was that they returned values which did
>> not make strict normality and homogeneity of variances assumptions.
>> In the data set I gave in my previous email, there is
>> heteroskedasticity and non-normality distribution of data. So from
>> what I understand from my reading, robust methods will give me a
>> better sense of what's going on in the bulk of my data than least-
>> squares estimates. If this is true, then what is the reason for
>> looking at diagnostic plots? If I find the data is still
>> heteroskedastic and non-normal in the plots after the robust
>> analysis, is this cause for worry?
>>
>>
>> Dr. Jenifer Larson-Hall
>> Assistant Professor of Linguistics
>> University of North Texas
>> (940)369-8950
>>
>>
>>
>
>
>




More information about the R-SIG-Robust mailing list