[R-sig-ME] Using nlme package in an information-theoretic framework, AIC
Corey VanStratt
csv5 at sfu.ca
Tue Jan 4 00:51:41 CET 2011
Dear Listers,
I am studying the foraging effort of surf scoters (a sea duck) during the winter period @ a site in southeast Alaska. This particular species only dives for its food, and therefore the response of interest is both minutes underwater per hour and minutes underwater per day. These 2 metrics provide a reliable estimate of how often these birds are feeding during the winter, and they can be compared to estimates collected in the same manner at different winter sites. To collect these data, we attached VHF transmitters to 110 individuals; we monitored individuals during 1-hour foraging observations; we knew a bird was diving when a signal was lost, as the VHF radio signal does not transmit through water. For each observation, we calculated total minutes underwater/hour and extrapolated to minutes underwater/day (these ducks are diurnal foragers, so based daily rates on amount of daylight hours). Of the 110 individuals marked, we were able to get at least 2 observations on 65 individuals. Over 2 seasons, there were a total of 959 1-hour observations on these 65 individuals.
The data set was unbalanced and we observed the same individuals over time (repeated measures, therefore requiring mixed models); the number of observation periods was not equal among all individuals and time between observations within and among individuals was not equal. Therefore, we required a mixed modeling procedure to account for these repeated measures of individuals. We used the nlme package in R to specify individual as a random effect.
Foraging effort variation at all sites was evaluated using an information-theoretic approach. Therefore, rather than 'model building', as is described in much of the nlme documentation and resources, I am testing several pre-determined, biologically plausible models. However, I have read in prior posts that AIC values may not be reliable from this package, and I have been using these values to compare the models. So, I have three main questions:
1.) Does the process I describe below to analyze these data make sense?
2.) In the case of this data set and the candidate model set, is it appropriate to use the given AIC values from the nlme output for determining which model is most parsimonious?
AND
3.) If it is not appropriate to use these AIC values, what is the best alternative (e.g. how could I obtain AIC values that are reliable)?
So, moving onto the process:
We evaluated the variation in response of both minutes underwater per hr and per day in relation to several covariates. We considered 16 biologically plausible a priori candidate models (see below). We explored date as an explanatory variable, and we included it both as a date and date squared term (referred to as DATE), to allow variation in foraging effort to vary non-linearly over time; we adjusted all date values in both years to correspond to 2 December = day 1. We always considered sex (male or female) and age [hatch year (HY) or after hatch year (AHY)] together as COHORT, as these four groups often behave differently. We included a YEAR term (2008/09 or 2009/10) to account for variation between years. We considered the following environmental factors (ENVIRON) in each model except for the true null: tide height (m), temperature (°C), time of day of observation, and wind speed (categorical with three levels—low, mid, high). We did this to both limit the number of models in our set, and because we merely wanted to account for environmental effects. Finally, we included the interaction effect terms COHORT*DATE and COHORT*YEAR, because we believed it quite feasible that the different cohorts would respond to environmental pressures differently both across time within a winter season and between the winter seasons.
MODEL EXPLANATORY VARIABLES
fmN Null
fm1 Environ
fm2 Date + environ
fm3 Cohort + environ
fm4 Year + environ
fm5 Date + cohort + environ
fm6 Date + year + environ
fm7 Cohort + year + environ
fm8 Date + cohort + year + environ
fm9 Date + environ + cohort x date
fm10 Year + environ + cohort x date
fm11 Date + cohort + environ + cohort x date
fm12 Date + cohort + year + environ + cohort x date
fm13 Cohort + environ + cohort x year
fm14 Date + cohort + environ + cohort x year
fm15 Date + cohort + year + environ + cohort x date + cohort x year
For each response variable, we first ran the most global model (fm15) in our set using a model with no random effects, a model with a random intercept (allowed intercept to vary among individuals), and a model with a random intercept and slope (allowed both intercept to vary among individuals and slope to vary within each individual over DATE). 'ID' in the random effects refers to each of the 65 uniquely identified ducks. Using an AIC approach, we found that the random intercept and slope model was the most parsimonious. Proceeding with this global model and continuing with an AIC information-theoretic approach, we ran the global model (inc. the random intercept/slope) against several different feasible covariance structures, and found that for both response variables (minutes underwater hr-1 and day-1), an exponential covariance structure was the best-fitting based on AIC values. Therefore, we applied this covariance structure to all of the models in the candidate model set and proceeded with the overall best-fitting model selection process. That final most parameterized model specification as written for R is as follows:
fm15 <- lme(fixed=MINh~((Date+DateSq)*(Cohort))+((Cohort)*Year)+Tide+Temp+factor(Sea)+Time.start,data=data1,random=~Date+DateSq|ID,corr=exp2,method="ML")
We calculated Akaike’s Information Criterion (AICc), adjusted for small sample size, for all of the models in the candidate model set from the AIC values given in the lme output. We then examined relative support for each model using Δ AICc and AICc weights (wi).
So, again, being relatively new to both R and mixed modeling, I would be much appreciative if those with a greater understanding of the process could inform me if the process I have used indeed is appropriate/makes sense (as I have not used a model building exercise that is often described when using this package), and if the AIC values can in fact be used with relative confidence.
Thanks so much for any insight,
Corey VanStratt
Example of output from fm15:
Linear mixed-effects model fit by maximum likelihood
Data: data1
AIC BIC logLik
5723.309 5835.224 -2838.654
Random effects:
Formula: ~DateB + DateSq | ID
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 3.500360472 (Intr) DateB
DateB 0.175541462 -0.680
DateSq 0.001982993 0.576 -0.967
Residual 4.185115132
Correlation Structure: Exponential spatial correlation
Formula: ~DateB | ID
Parameter estimate(s):
range
0.674106
Fixed effects: MINh ~ DateB + DateSq + Cohort + (Cohort):(Season) + Tide + Temp + factor(Sea) + Time.start
Value Std.Error DF t-value p-value
(Intercept) 9.972296 2.624219 887 3.800100 0.0002
DateB 0.025724 0.031821 887 0.808423 0.4191
DateSq 0.000128 0.000336 887 0.379635 0.7043
CohortAHY M -7.337342 3.060351 57 -2.397549 0.0198
CohortHY F 4.634536 4.220465 57 1.098110 0.2768
CohortHY M -7.870449 3.346530 57 -2.351824 0.0222
Tide 0.157180 0.088498 887 1.776074 0.0761
Temp -0.009083 0.039280 887 -0.231239 0.8172
factor(Sea)6 0.111870 2.194239 887 0.050983 0.9594
factor(Sea)7-8 3.294262 1.961388 887 1.679556 0.0934
Time.start 0.087318 0.068847 887 1.268298 0.2050
CohortAHY F:Season -1.925349 1.412226 57 -1.363344 0.1781
CohortAHY M:Season 1.804896 1.235439 57 1.460935 0.1495
CohortHY F:Season -4.522203 1.914241 57 -2.362400 0.0216
CohortHY M:Season 2.323518 1.469479 57 1.581185 0.1194
Correlation:
(Intr) DateB DateSq ChAHYM ChrHYF ChrHYM Tide Temp
DateB -0.207
DateSq 0.156 -0.955
CohortAHY M -0.705 -0.002 0.007
CohortHY F -0.517 -0.004 0.012 0.442
CohortHY M -0.651 -0.022 0.029 0.557 0.405
Tide -0.141 0.027 -0.006 0.038 0.019 0.042
Temp 0.146 -0.072 0.013 0.011 -0.020 0.005 -0.086
factor(Sea)6 -0.006 -0.023 0.026 0.022 0.016 0.023 -0.008 0.024
factor(Sea)7-8 -0.027 -0.019 0.009 0.015 0.013 0.000 0.069 0.000
Time.start -0.277 0.021 -0.034 -0.032 -0.020 -0.015 -0.080 -0.014
CohortAHY F:Season -0.879 0.002 0.015 0.740 0.541 0.680 0.057 -0.088
CohortAHY M:Season -0.032 0.010 0.003 -0.584 0.005 0.003 0.013 -0.116
CohortHY F:Season -0.013 0.012 -0.008 0.000 -0.796 0.002 0.023 -0.034
CohortHY M:Season -0.015 0.032 -0.027 -0.002 0.002 -0.661 -0.001 -0.091
fc(S)6 f(S)7- Tm.str CAHYF: CAHYM: CHYF:S
DateB
DateSq
CohortAHY M
CohortHY F
CohortHY M
Tide
Temp
factor(Sea)6
factor(Sea)7-8 0.021
Time.start -0.041 0.014
CohortAHY F:Season 0.022 0.024 -0.028
CohortAHY M:Season -0.003 0.004 0.029 0.014
CohortHY F:Season -0.001 0.001 0.009 0.004 0.005
CohortHY M:Season -0.023 0.018 -0.002 0.007 0.010 0.001
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.5466867 -0.6254431 -0.1261256 0.5150277 3.4233514
Number of Observations: 959
Number of Groups: 65
>
***************************************
Corey VanStratt
M.Sc. Candidate
Centre for Wildlife Ecology
Department of Biological Sciences
Simon Fraser University
Burnaby, BC
CANADA V5A 1S6
More information about the R-sig-mixed-models
mailing list