[R] outlier
kan Liu
kan_liu1 at yahoo.com
Tue Jun 17 23:24:15 CEST 2003
Hi, many thanks for your advice. I appreciate very
much. Maybe I can make the question more clear: I want
to evaluate the correlation between two variables: one
is the actual outputs of a system, another is the
predicted values of the outputs of the system using
neural networks. When I made scatterplots in excel, I
can get the linear equation and the corresponding
R-squared. In the bottom of the page
http://www.statsoftinc.com/textbook/stathome.html, it
mentioned that sometimes outliers will affect
correlation coefficient biasly. So I thought it might
be worth to remove outlier before calculating
R-squared in R. It seems to be a bad idea according to
your comments. Now can you make comments on how to
evaluate the performance of the neural network model
in predicting the actual outputs?
Kan
--- Spencer Graves <spencer.graves at PDF.COM> wrote:
> It is also wise to make scatterplots, as shown by
> the famous examples
> produced of 4 scatterplots with the same R^2, where
> the first shows the
> standard ellipsoid pattern implied by the
> assumptions while the other
> three indicate very clearly that the assumptions are
> incorrect. See
> Anscombe (1973) "Graphs in Statistical Analysis",
> The American
> Statistician, 27: 17-22, reproduced in, e.g., du
> Toit, Steyn and Stumpf
> (1986) Graphical Exploratory Data Analysis
> (Springer).
>
> hth. spencer graves
>
> Prof Brian Ripley wrote:
> > On Tue, 17 Jun 2003, kan Liu wrote:
> >
> >
> >> I want to calculate the R-squared between two
> variables. Can you advice
> >>me how to identify and remove the outliers before
> performing R-squared
> >>calculation?
> >
> >
> > Easy: you don't. It make no sense to consider R^2
> after arbitrary outlier
> > removal: if I remove all but two points I get R^2
> = 1!
> >
> > R^2 is normally used to measure the success of a
> multiple regression, but
> > as you mention two variables, did you just mean
> the Pearson
> > product-moment correlation? It makes more sense
> to use a robust measure
> > of correlation, as in cov.rob (package lqs) or
> even Spearman or Kendall
> > measures (cov.test in package ctest).
> >
> > If you intended to do this for a multiple
> regression, you need to do some
> > sort of robust regression and a use a robust
> measure of fit.
> >
>
>
__________________________________
SBC Yahoo! DSL - Now only $29.95 per month!
More information about the R-help
mailing list