[R-sig-ME] Assessing linearity

Mon Oct 25 17:59:39 CEST 2010

Hi Mike,

Couple of things.

poly() is used like this

a <- rnorm(100)
b <- rnorm(100)

m2 <- lm(a~poly(b,2))
summary(m2)

To compare a model with a linear predictor and a quadratic polynomial
you can do this:
m0 <- lm(a~1)
m1 <- lm(a~poly(b,1))
m2 <- lm(a~poly(b,2))
anova(m1,m2)

In the case of likelihood fitted models you'll get a likelihood ratio test.

You wouldn't normally fit a quadratic term instead of the linear term,
rather in addition, so m2 has 1 extra parameter. You can compare
multiple models at once with anova(m0,m1,m2), or you can go the AIC
route and calculate deltaAIC for your set of candidate models.

Unless you expect the non-linearity to be polynomial shaped you might
be better off fitting splines with varying number of knots/df.

andydolman at gmail.com

On 24 October 2010 18:43, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:
> Ah, this looks promising! So how does this sound:
>
> I typically assess the evidence for a relationship between the
> predictor and response variables by comparing the AIC values for a
> model including the predictor to a model without it. In the case of
> grade_as_numeric, I'd do:
>
> fit_null = lmer(
>    formula = response ~ (1|individual)
>    data = my_data
> )
> fit_null_AIC = AIC(fit_null)
>
> fit_alt = lmer(
>    formula = response ~ (1|individual) + grade_as_numeric
>    data = my_data
> )
> fit_alt_AIC = AIC(fit_alt)
>
> grade_loglikratio = fit_null_AIC - fit_alt_AIC
>
> Now, if I wanted to check whether there is a quadratic component to
> the grade effect, I'd first compute an analogous likelihood ratio for
> the quadratic fit compared to the null:
> fit_alt_quad = lmer(
>    formula = response ~ (1|individual) + poly(grade_as_numeric)^2
>    data = my_data
> )
> fit_alt_quad_AIC = AIC(fit_alt_quad)
> grade_quad_loglikratio = fit_null_AIC - fit_alt_quad_AIC
>
> Then compute a final log likelihood ratio between the improvement over
> the null caused by grade versus the improvement over the null caused
> by grade as a quadratic:
>
> grade_lin_vs_quad_LLR = grade_loglikratio - grade_quad_loglikratio
>
> I could repeat this for higher-order polynomials of grade, each
> compared to the order directly below it, to develop a series of
> likelihoood ratios that describe the relative improvement of the fit.
>
> Does this sound appropriate?
>
> Cheers,
>
> Mike
>
> --
> Mike Lawrence
> Graduate Student
> Department of Psychology
> Dalhousie University
>
> Looking to arrange a meeting? Check my public calendar:
> http://tr.im/mikes_public_calendar
>
> ~ Certainty is folly... I think. ~
>
>
>
> On Sun, Oct 24, 2010 at 6:41 AM, Jarrod Hadfield <j.hadfield at ed.ac.uk> wrote:
>> Hi Mike,
>>
>> You would be better off trying out something like polynomials or splines,
>> For example:
>>
>>  fit1 = lmer(
>>     formula = response ~ (1|individual)+poly(grade_as_numeric,n),
>>     , data = my_data
>>     , family = gaussian
>>  )
>>
>> where n is the order of the polynomial. n=1 would fit the same model as your
>> original fit1, although the covariate (and the regression parameter) would
>> be scaled by some number. When n=6 the model would be a reparameterised
>> version of your model fit2. When 1<n<6 you would be working with a
>> non-linear relationship in between these two extremes, although the model is
>> still linear in the parameters.
>>
>> Cheers,
>>
>> Jarrod
>>
>>
>>
>>
>>
>>
>>
>>
>> Quoting Mike Lawrence <Mike.Lawrence at dal.ca>:
>>
>>> Hi folks,
>>>
>>> I have developmental data collected across several grades (1-6). I
>>> would like to be able to assess whether there are any linear or
>>> non-linear trends across grade. Does it make sense to run a first lmer
>>> treating grade as continuous, obtain the residuals, then run a second
>>> lmer treating grade as a factor? That is:
>>>
>>> fit1 = lmer(
>>>    formula = response ~ (1|individual)+grade_as_numeric
>>>    , data = my_data
>>>    , family = gaussian
>>> )
>>> my_data$resid = residuals(fit1)
>>> fit2 = lmer(
>>>    formula = resid ~ (1|individual)+grade_as_factor
>>>    , data = my_data
>>>    , family = gaussian
>>> )
>>>
>>>
>>> As I understand it, fit1 will tell me if there are any linear trends
>>> in the data, while fit2 will tell me if there are any non-linear
>>> trends in the data in addition to the linear trends obtained in fit1.
>>>
>>> If this is sensible, how might I apply it to a second binomial
>>> response variable given that the residuals from a binomial model are
>>> not 0/1?
>>>
>>> Cheers,
>>>
>>> Mike
>>>
>>> --
>>> Mike Lawrence
>>> Graduate Student
>>> Department of Psychology
>>> Dalhousie University
>>>
>>> Looking to arrange a meeting? Check my public calendar:
>>> http://tr.im/mikes_public_calendar
>>>
>>> ~ Certainty is folly... I think. ~
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>