[R] question about linear regression and leverage

George Markomanolis george at markomanolis.com
Tue Jun 21 13:49:11 CEST 2011


Dear David,

Thanks for your answer. Yes now that you mentioned these points are in
the beginning of a variable range. From the plot of the residuals seems
to have non constant variance which is solved by a transformation. I
checked also for interactions by using the symbol : between two
variables and the change on the result was not so important. I am
working on computer science field but I wanted to do an analysis from
scratch because some previous results that I have seen are not good for
such cases. Moreover the data are not the same of course.

Thanks,
George

On 06/21/2011 01:08 PM, David Winsemius wrote:
>
> On Jun 21, 2011, at 3:49 AM, George Markomanolis wrote:
>
>> Dear all,
>>
>> I am new to this field and I have a question about a linear regression.
>> I have a dataset of around to 31000 points and I want to apply a linear
>> regression. The R-squared is 0.9 however when I check the diagnostic
>> plots I can see that there are around to 250 points with big leverage
>> value. As I know the points with big leverage influence a lot the fit.
>> If I remove these points in order to check their influence, the
>> R-squared of the rest points is 0.71. So I removed less than 1% of my
>> data and the fit is not so good. Could you please give me any advice
>> about this? Is it right to let these 250 points in my dataset or not?
>> Could I do something else? The data are measured through an experiment
>> so even these 250 points are real values.
>
> You could be looking at the descriptive statistics on the points.
> Perhaps they are at one end of a variable range, or you perhaps have
> some other feature that is scientifically interesting. So far you have
> only been examining one set of simple linear hypotheses and have not
> (presumably) been looking at any non-linear possibilities or the
> potential that interactions are affecting the outcome. The prior 
> science of your (so far undescribed) domain should be carefully
> considered, but in your message we see no evidence of such.
>



More information about the R-help mailing list