[R] Plot different regression models on one graph
Peter Ehlers
ehlers at ucalgary.ca
Mon Feb 15 14:27:45 CET 2010
Keith, we seem to agree, more or less.
I take a simple approach to model fitting: absent any
compelling reason for a particular form (here: cubic),
I consider that form to be speculative, sometimes
reasonably well supported by the data, sometimes not.
For the OP's data, the evidence for a cubic model is
all but nonexistent. Based on those data, I would never
recommend a cubic model to a client. As to how one
chooses to assess the (in)adequacy of the model, I'm
well aware of the various more formal techniques, but
they're often not needed.
(The use of poly() in this example saves typing.)
-Peter Ehlers
kMan wrote:
> Dear Peter,
>
> Ah, I see your point, Professor. The point at x=23.5 is carrying the model.
> Allow me to clarify. I was making a similar point.
>
> I was suggesting that the cube term could be left in the model, as you did,
> but rather than dropping the data-point, another model is fit with an
> additional parameter added to control for the suspected outlier -
> essentially a point discontinuity.
>
> Note that the call to poly() is not necessary for the graphical
> representation, so I'll continue the example without it.
> fm3<-lm(y~x+I(x^2)+I(x^3), data=dat)
> fm3.c<-coef(fm3)
>
> # drop data point alternative subseting using logical indexes.
> index<-dat$x!=23.5
> x2<-x[index]
> x2<-dat$x[index]
> y2<-dat$y[index]
> fm4<-lm(y2~x2+I(x2^2)+I(x2^3), data=dat)
>
> # controlling for the potential outlier in the model, without throwing out
> the data.
> ctrl<-rep(0,length=dim(dat)[1])
> ctrl[dat$x==23.5]<-resid(fm3)[[dat$x==23.5]
> fm5<-lm(y~x+I(x^2)+I(x^3)+ctrl, data=dat)
>
> # avoids the predict & lines calls, but requires knowledge of the model.
> curve(fm3.c[1]+fm3.c[2]*x+fm3.c[3]*x^2+fm3.c[4]*x^3, col="green", add=TRUE)
> curve(fm4.c[1]+fm4.c[2]*x+fm4.c[3]*x^2+fm4.c[4]*x^3, col="green", add=TRUE)
> # plot dropped outlier
> curve(fm5.c[1]+fm5.c[2]*x+fm5.c[3]*x^2+fm5.c[4]*x^3, col="purple", add=TRUE)
> # plot without ctrl variable
>
> anova(fm5)
>
> Tells us how much difference one point made, sacrificing a df just to ensure
> we're not throwing out information haphazardly. F>1 or whatever cutoff you
> want to use. If it turns out that the point is important, then at least its
> existence & effect was documented.
>
> There is some irony, I suppose, that the graphical representation of
> dropping the point vs. adding a control variable, is equivalent for this
> example. I have not decided how I feel about that yet, but I do have a
> splitting headache.
>
> Sincerely,
> KeithC.
>
> -----Original Message-----
> From: Peter Ehlers [mailto:ehlers at ucalgary.ca]
> Sent: Sunday, February 14, 2010 10:04 PM
> To: kMan
> Cc: r-help at r-project.org
> Subject: Re: [R] Plot different regression models on one graph
>
> kMan wrote:
>> Peter wrote:
>>> You like to live dangerously.
>> Clue me in, Professor.
>>
>> Sincerely,
>> KeithC.
>>
>
> Okay, Keith, here goes:
>
> dat <-
> structure(list(x = c(62.5, 68.5, 0, 52, 0, 52, 0, 52, 23.5, 86, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0), y = c(0.054, 0.055, 0.017, 0.021, 0.02, 0.028, 0.032,
> 0.073, 0.076, 0.087, 0.042, 0.042, 0.041, 0.045, 0.021, 0.018, 0.017, 0.018,
> 0.028, 0.022)), .Names = c("x", "y"), row.names = c(NA, -20L), class =
> "data.frame")
>
> fm1 <- lm(y ~ poly(x,3), data = dat)
> fm2 <- lm(y ~ poly(x,3), data = dat, subset = -9)
>
> xx <- 0:86
> yhat1 <- predict(fm1, list(x = xx))
> yhat2 <- predict(fm2, list(x = xx))
>
> plot(x,y)
> lines(xx, yhat1, col="blue", lwd=2)
> lines(xx, yhat2, col="red", lwd=2)
>
>
> That's how much difference *one* point makes in a cubic fit!
> I'm not much of a gambler, so I wouldn't base any important decisions on the
> results of the fit.
>
> Cheers,
>
> -Peter Ehlers
>
>> -----Original Message-----
>> From: Peter Ehlers [mailto:ehlers at ucalgary.ca]
>> Sent: Sunday, February 14, 2010 6:49 PM
>> To: kMan
>> Cc: 'David Winsemius'; 'Rhonda Reidy'; r-help at r-project.org
>> Subject: Re: [R] Plot different regression models on one graph
>>
>> kMan wrote:
>>> I would use all of the data. If you want to "drop" one, control for
>>> it in the model & sacrifice a degree of freedom.
>> You like to live dangerously.
>>
>> -Peter Ehlers
>>
>>> Why the call to poly() by the way?
>>>
>>> KeithC.
>>>
>>> -----Original Message-----
>>> From: Peter Ehlers [mailto:ehlers at ucalgary.ca]
>>> Sent: Saturday, February 13, 2010 1:35 PM
>>> To: David Winsemius
>>> Cc: Rhonda Reidy; r-help at r-project.org
>>> Subject: Re: [R] Plot different regression models on one graph
>>>
>>> Rhonda:
>>>
>>> As David points out, a cubic fit is rather speculative. I think that
>>> one needs to distinguish two situations: 1) theoretical justification
>>> for a cubic model is available, or 2) you're dredging the data for the
> "best"
>> fit.
>>> Your case is the second. That puts you in the realm of EDA
>>> (exploratory
>> data
>>> analysis). You're free to fit any model you wish, but you should
>>> assess
>> the
>>> value of the model. One convenient way is to use the
>>> influence.measures() function, which will show that points 9 and 10
>>> in your data have a strong influence on your cubic fit (as, of
>>> course, your eyes would tell you). A good thing to do at this point is to
> fit 3 more cubic models:
>>> 1) omitting point 9, 2) omitting point 10, 3) omitting both.
>>>
>>> Try it and plot the resulting fits. You may be surprised.
>>>
>>> Conclusion: Without more data, you can conclude nothing about a
>>> non-straightline fit.
>>>
>>> (And this hasn't even addressed the relative abundance of x=0 data.)
>>>
>>> -Peter Ehlers
>>>
>>> David Winsemius wrote:
>>>> On Feb 13, 2010, at 1:35 PM, Rhonda Reidy wrote:
>>>>
>>>>> The following variables have the following significant
>>>>> relationships (x is the explanatory variable): linear, cubic,
> exponential, logistic.
>>>>> The linear relationship plots without any trouble.
>>>>>
>>>>> Cubic is the 'best' model, but it is not plotting as a smooth curve
>>>>> using the following code:
>>>>>
>>>>> cubic.lm<- lm(y~poly(x,3))
>>>> Try:
>>>>
>>>> lines(0:80, predict(cubic.lm, data.frame(x=0:80)),lwd=2)
>>>>
>>>> But I really must say the superiority of that relationship over a
>>>> linear one is far from convincing to my eyes. Seems to be caused by
>>>> one data point. I hope the quotes around "best" mean tha tyou have
>>>> the
>>> same qualms.
>>>>> lines(x,predict(cubic.lm),lwd=2)
>>>>>
>>>>> How do I plot the data and the estimated curves for all of these
>>>>> regression models in the same plot?
>>>>>
>>>>> x <- c(62.5,68.5,0,52,0,52,0,52,23.5,86,0,0,0,0,0,0,0,0,0,0)
>>>>>
>>>>> y <-
>>>>> c(0.054,0.055,0.017,0.021,0.020,0.028,0.032,0.073,0.076,0.087,0.042
>>>>> ,0
>>>>> .042,0.041,0.045,0.021,0.018,0.017,0.018,0.028,0.022)
>>>>>
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Rhonda Reidy
>>>>>
>>> --
>>> Peter Ehlers
>>> University of Calgary
>>>
>>>
>>>
>
>
>
--
Peter Ehlers
University of Calgary
More information about the R-help
mailing list