[R] question about linear regression and leverage

David Winsemius dwinsemius at comcast.net
Tue Jun 21 13:08:12 CEST 2011


On Jun 21, 2011, at 3:49 AM, George Markomanolis wrote:

> Dear all,
>
> I am new to this field and I have a question about a linear  
> regression.
> I have a dataset of around to 31000 points and I want to apply a  
> linear
> regression. The R-squared is 0.9 however when I check the diagnostic
> plots I can see that there are around to 250 points with big leverage
> value. As I know the points with big leverage influence a lot the fit.
> If I remove these points in order to check their influence, the
> R-squared of the rest points is 0.71. So I removed less than 1% of my
> data and the fit is not so good. Could you please give me any advice
> about this? Is it right to let these 250 points in my dataset or not?
> Could I do something else? The data are measured through an experiment
> so even these 250 points are real values.

You could be looking at the descriptive statistics on the points.  
Perhaps they are at one end of a variable range, or you perhaps have  
some other feature that is scientifically interesting. So far you have  
only been examining one set of simple linear hypotheses and have not  
(presumably) been looking at any non-linear possibilities or the  
potential that interactions are affecting the outcome. The prior   
science of your (so far undescribed) domain should be carefully  
considered, but in your message we see no evidence of such.

-- 
David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list