[RsR] Questions about interpreting lmRob output
Jenifer Larson-Hall
jen||er @end|ng |rom unt@edu
Wed Nov 14 18:10:21 CET 2007
Good, I understand that answer. What I still don't understand is whether the diagnostic plot that robust calls is plotting ALL the data, or just the good part. For example, when I look at an overlaid Q-Q plot of my data, there is not too much difference between the robust and ls lines. They are both linear in the middle, but form an S-curve on the ends (maybe 7 points out of 37 are in the ends). So if I have a breakdown point of around .5, and then I look at my Q-Q plot and see that the middle part looks linear, can I rest easy? Or is plot.lmRob plotting ONLY the good part of the data, so now I should be concerned because I don't have a normally distributed Q-Q plot? (by the way, this data was log transformed already to try to help with the heteroskedasticity).
>>> Kjell Konis <konis using stats.ox.ac.uk> 11/14/07 10:08 AM >>>
The basic idea underlying the robust linear model is that some
fraction (1-alpha > 0.5) of the data is distributed conditionally
normal and the remaining fraction (alpha) comes from some arbitrary
distribution (i.e., the outliers). The goal of a robust method is to
estimate the parameters (beta and sigma^2) of this conditional normal
distribution without giving the outliers too much influence. If the
bulk of the data (aka the good data) is not distributed conditionally
normal then a linear model is not appropriate regardless of whether it
is fit robustly or not. Of course you can still use all of the
standard linear modeling tricks. For instance a log transformation of
the response sometimes helps with heteroskedasticity.
Kjell
On 14 Nov 2007, at 15:24, Jenifer Larson-Hall wrote:
> Thanks so much Kjell. Your response answers most of my questions.
> Actually, I figured the overlaid plots things out (and the cool
> fit.models function) by looking through the archives and finding
> your pdf presentation that showed it (www.stats.ox.ac.uk/~konis/robust/ROBCLA2006-konis.pdf)
> . That was very helpful!
>
> The documentation you sent me privately (Robust.pdf, documentation
> for S-PLUS library) was helpful in clearing up a few more lingering
> questions (I guess if others want it they can email you).
The Robust Library Users Guide (Robust.pdf) is included in the source
version of the Robust Library.
> Just one more question now:
>
> My sense of robust methods was that they returned values which did
> not make strict normality and homogeneity of variances assumptions.
> In the data set I gave in my previous email, there is
> heteroskedasticity and non-normality distribution of data. So from
> what I understand from my reading, robust methods will give me a
> better sense of what's going on in the bulk of my data than least-
> squares estimates. If this is true, then what is the reason for
> looking at diagnostic plots? If I find the data is still
> heteroskedastic and non-normal in the plots after the robust
> analysis, is this cause for worry?
>
>
> Dr. Jenifer Larson-Hall
> Assistant Professor of Linguistics
> University of North Texas
> (940)369-8950
>
>
>
More information about the R-SIG-Robust
mailing list