[R-sig-ME] Using nlme package in an information-theoretic framework, AIC

Tue Jan 4 00:51:41 CET 2011

Dear Listers,

I am studying the foraging effort of surf scoters (a sea duck) during the winter period @ a site in southeast Alaska. This particular species only dives for its food, and therefore the response of interest is both minutes underwater per hour and minutes underwater per day.  These 2 metrics provide a reliable estimate of how often these birds are feeding during the winter, and they can be compared to estimates collected in the same manner at different winter sites. To collect these data, we attached VHF transmitters to 110 individuals; we monitored individuals during 1-hour foraging observations; we knew a bird was diving when a signal was lost, as the VHF radio signal does not transmit through water.  For each observation, we calculated total minutes underwater/hour and extrapolated to minutes underwater/day (these ducks are diurnal foragers, so based daily rates on amount of daylight hours).  Of the 110 individuals marked, we were able to get at least 2 observations on 65 individuals. Over 2 seasons, there were a total of 959 1-hour observations on these 65 individuals.

The data set was unbalanced and we observed the same individuals over time (repeated measures, therefore requiring mixed models); the number of observation periods was not equal among all individuals and time between observations within and among individuals was not equal.  Therefore, we required a mixed modeling procedure to account for these repeated measures of individuals.  We used the nlme package in R to specify individual as a random effect.

Foraging effort variation at all sites was evaluated using an information-theoretic approach.  Therefore, rather than 'model building', as is described in much of the nlme documentation and resources, I am testing several pre-determined, biologically plausible models. However, I have read in prior posts that AIC values may not be reliable from this package, and I have been using these values to compare the models. So, I have three main questions:

1.) Does the process I describe below to analyze these data make sense?

2.) In the case of this data set and the candidate model set, is it appropriate to use the given AIC values from the nlme output for determining which model is most parsimonious?

AND

3.) If it is not appropriate to use these AIC values, what is the best alternative (e.g. how could I obtain AIC values that are reliable)?

So, moving onto the process:

We evaluated the variation in response of both minutes underwater per hr and per day in relation to several covariates.  We considered 16 biologically plausible a priori candidate models (see below).  We explored date as an explanatory variable, and we included it both as a date and date squared term (referred to as DATE), to allow variation in foraging effort to vary non-linearly over time; we adjusted all date values in both years to correspond to 2 December = day 1.  We always considered sex (male or female) and age [hatch year (HY) or after hatch year (AHY)] together as COHORT, as these four groups often behave differently.  We included a YEAR term (2008/09 or 2009/10) to account for variation between years.  We considered the following environmental factors (ENVIRON) in each model except for the true null: tide height (m), temperature (°C), time of day of observation, and wind speed (categorical with three levels—low, mid, high).  We did this to both limit the number of models in our set, and because we merely wanted to account for environmental effects.  Finally, we included the interaction effect terms COHORT*DATE and COHORT*YEAR, because we believed it quite feasible that the different cohorts would respond to environmental pressures differently both across time within a winter season and between the winter seasons.

MODEL   EXPLANATORY VARIABLES
fmN	Null
fm1	Environ 
fm2	Date  + environ
fm3	Cohort  + environ
fm4	Year  + environ
fm5	Date + cohort + environ
fm6	Date + year + environ
fm7	Cohort + year + environ
fm8	Date + cohort + year + environ	
fm9	Date + environ + cohort x date 
fm10	Year + environ + cohort x date
fm11	Date + cohort + environ + cohort x date
fm12	Date + cohort + year + environ + cohort x date
fm13	Cohort + environ + cohort x year 
fm14	Date + cohort + environ + cohort x year
fm15	Date + cohort + year + environ + cohort x date + cohort x year 

For each response variable, we first ran the most global model (fm15) in our set using a model with no random effects, a model with a random intercept (allowed intercept to vary among individuals), and a model with a random intercept and slope (allowed both intercept to vary among individuals and slope to vary within each individual over DATE).  'ID' in the random effects refers to each of the 65 uniquely identified ducks.  Using an AIC approach, we found that the random intercept and slope model was the most parsimonious.  Proceeding with this global model and continuing with an AIC information-theoretic approach, we ran the global model (inc. the random intercept/slope) against several different feasible covariance structures, and found that for both response variables (minutes underwater hr-1 and day-1), an exponential covariance structure was the best-fitting based on AIC values.  Therefore, we applied this covariance structure to all of the models in the candidate model set and proceeded with the overall best-fitting model selection process. That final most parameterized model specification as written for R is as follows:

fm15 <- lme(fixed=MINh~((Date+DateSq)*(Cohort))+((Cohort)*Year)+Tide+Temp+factor(Sea)+Time.start,data=data1,random=~Date+DateSq|ID,corr=exp2,method="ML")

We calculated Akaike’s Information Criterion (AICc), adjusted for small sample size, for all of the models in the candidate model set from the AIC values given in the lme output.  We then examined relative support for each model using Δ AICc and AICc weights (wi).  

So, again, being relatively new to both R and mixed modeling, I would be much appreciative if those with a greater understanding of the process could inform me if the process I have used indeed is appropriate/makes sense (as I have not used a model building exercise that is often described when using this package), and if the AIC values can in fact be used with relative confidence. 

Thanks so much for any insight,
Corey VanStratt

Example of output from fm15:

Linear mixed-effects model fit by maximum likelihood
 Data: data1 
       AIC      BIC    logLik
  5723.309 5835.224 -2838.654

Random effects:
 Formula: ~DateB + DateSq | ID
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev      Corr         
(Intercept) 3.500360472 (Intr) DateB 
DateB       0.175541462 -0.680       
DateSq      0.001982993  0.576 -0.967
Residual    4.185115132              

Correlation Structure: Exponential spatial correlation
 Formula: ~DateB | ID 
 Parameter estimate(s):
   range 
0.674106 
Fixed effects: MINh ~ DateB + DateSq + Cohort + (Cohort):(Season) + Tide + Temp +      factor(Sea) + Time.start 
                       Value Std.Error  DF   t-value p-value
(Intercept)         9.972296  2.624219 887  3.800100  0.0002
DateB               0.025724  0.031821 887  0.808423  0.4191
DateSq              0.000128  0.000336 887  0.379635  0.7043
CohortAHY M        -7.337342  3.060351  57 -2.397549  0.0198
CohortHY F          4.634536  4.220465  57  1.098110  0.2768
CohortHY M         -7.870449  3.346530  57 -2.351824  0.0222
Tide                0.157180  0.088498 887  1.776074  0.0761
Temp               -0.009083  0.039280 887 -0.231239  0.8172
factor(Sea)6        0.111870  2.194239 887  0.050983  0.9594
factor(Sea)7-8      3.294262  1.961388 887  1.679556  0.0934
Time.start          0.087318  0.068847 887  1.268298  0.2050
CohortAHY F:Season -1.925349  1.412226  57 -1.363344  0.1781
CohortAHY M:Season  1.804896  1.235439  57  1.460935  0.1495
CohortHY F:Season  -4.522203  1.914241  57 -2.362400  0.0216
CohortHY M:Season   2.323518  1.469479  57  1.581185  0.1194
 Correlation: 
                   (Intr) DateB  DateSq ChAHYM ChrHYF ChrHYM Tide   Temp  
DateB              -0.207                                                 
DateSq              0.156 -0.955                                          
CohortAHY M        -0.705 -0.002  0.007                                   
CohortHY F         -0.517 -0.004  0.012  0.442                            
CohortHY M         -0.651 -0.022  0.029  0.557  0.405                     
Tide               -0.141  0.027 -0.006  0.038  0.019  0.042              
Temp                0.146 -0.072  0.013  0.011 -0.020  0.005 -0.086       
factor(Sea)6       -0.006 -0.023  0.026  0.022  0.016  0.023 -0.008  0.024
factor(Sea)7-8     -0.027 -0.019  0.009  0.015  0.013  0.000  0.069  0.000
Time.start         -0.277  0.021 -0.034 -0.032 -0.020 -0.015 -0.080 -0.014
CohortAHY F:Season -0.879  0.002  0.015  0.740  0.541  0.680  0.057 -0.088
CohortAHY M:Season -0.032  0.010  0.003 -0.584  0.005  0.003  0.013 -0.116
CohortHY F:Season  -0.013  0.012 -0.008  0.000 -0.796  0.002  0.023 -0.034
CohortHY M:Season  -0.015  0.032 -0.027 -0.002  0.002 -0.661 -0.001 -0.091
                   fc(S)6 f(S)7- Tm.str CAHYF: CAHYM: CHYF:S
DateB                                                       
DateSq                                                      
CohortAHY M                                                 
CohortHY F                                                  
CohortHY M                                                  
Tide                                                        
Temp                                                        
factor(Sea)6                                                
factor(Sea)7-8      0.021                                   
Time.start         -0.041  0.014                            
CohortAHY F:Season  0.022  0.024 -0.028                     
CohortAHY M:Season -0.003  0.004  0.029  0.014              
CohortHY F:Season  -0.001  0.001  0.009  0.004  0.005       
CohortHY M:Season  -0.023  0.018 -0.002  0.007  0.010  0.001

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.5466867 -0.6254431 -0.1261256  0.5150277  3.4233514 

Number of Observations: 959
Number of Groups: 65 
> 

***************************************
Corey VanStratt
M.Sc. Candidate
Centre for Wildlife Ecology
Department of Biological Sciences
Simon Fraser University
Burnaby, BC 
CANADA V5A 1S6