[R-sig-ME] Comparing Model Performance Across Data Sets: report p values?
Phillip Alday
phillip.alday at mpi.nl
Thu Aug 3 10:40:39 CEST 2017
Dear Karista,
as Thierry said, knowing more about the inferences you want to make will
get you better advice here. That said, I do have two suggestions in the
meantime:
1. Don't focus on significance, especially of individual predictors, as
much as estimates and overall model fit / predictive ability. (cf. The
New Statistics, The Difference between Significant and Insignificant is
not itself Significant, Choosing prediction over explanation in
psychology, etc.)
2. Put all your data into one model and include time period as a fixed
effect. Such pooling will generally help all your estimates; moreover,
it gives you a more principled way to compare time periods (both in the
main effect of time period and in its interactions with individual
variables).
Best,
Phillip
On 08/03/2017 10:20 AM, Thierry Onkelinx wrote:
> Dear Karista,
>
> Much depends on what you want to compare between the models. The parameter
> estimates? The predicted values? The goodness of fit? You 'll need to make
> that clear.
>
> Best regards,
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2017-08-02 19:54 GMT+02:00 Karista Hudelson <karistaeh at gmail.com>:
>
>> Hello All,
>>
>> I am comparing the fit of a mixed model on different time periods of a data
>> set. For the first time period I have 113 observations and only one of the
>> fixed effects is significant. For the second time period I have 322
>> observations and all of the fixed effects are significant. Because n is
>> important in the calculation of p, I'm not sure how or even if to interpret
>> the differences in p values for the model terms in the two time periods.
>> Does anyone have advice on how to compare the fit of the variables in the
>> mixed model for the two data sets in a way that is less impacted by the
>> difference in the number of observations? Or is a difference of 209
>> observations enough to drive these differences in p values?
>>
>> Time period 1 output:
>> Fixed effects:
>> Estimate Std. Error df t value Pr(>|t|)
>> (Intercept) -0.354795 0.811871 82.140000 -0.437 0.663
>> Length 0.024371 0.003536 106.650000 6.892 4.01e-10 ***
>> Res_Sea_Ice_Dur -0.002408 0.002623 107.970000 -0.918 0.361
>> Sp_MST 0.014259 0.024197 106.310000 0.589 0.557
>> Summer_Rain -0.005015 0.003536 107.970000 -1.418 0.159
>>
>>
>> Time period 2 output:
>> Fixed effects:
>> Estimate Std. Error df t value Pr(>|t|)
>> (Intercept) -1.183e+00 3.103e-01 6.650e+00 -3.812 0.007281 **
>> Length 1.804e-02 1.623e-03 3.151e+02 11.120 < 2e-16 ***
>> Res_Sea_Ice_Dur 2.206e-03 5.929e-04 3.153e+02 3.721 0.000235 ***
>> Spring_MST 1.022e-02 7.277e-03 3.150e+02 1.404 0.161319
>> Summer_Rain -1.853e-03 5.544e-04 3.150e+02 -3.343 0.000929 ***
>>
>>
>>
>>
>> Thanks in advance for your time and consideration of this question.
>> Karista
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
More information about the R-sig-mixed-models
mailing list