[R-sig-ME] Advice on GLS model specification

Sun Jan 15 18:10:43 CET 2012

Hello mixed modellers:
I'm helping a coworker analyze some toxicology data and would
appreciate any feedback on my proposed analysis, particularly in
regards to how I've specified a contrast.

He collected 1 blood sample from each of 79 animals at 7 sites in 2
regions. He would like to know the difference in the response variable
(concentration of a chemical in the blood) between regions after
accounting for two covariates (mass and temperature) which were
measured for each animal. The data are unbalanced and the variance of
the response variable also differs between sites.

My plan is to fit a GLS model with fixed effects of the two covariates
and site, specify the correlation and variance structures, and test
for the difference between regions using a contrast between the sites
in two regions. Reproducible code with data from dput() is at the end
of the message.

> str(data) #  each record is a blood sample
'data.frame':   79 obs. of  5 variables:
 $ region: Factor w/ 2 levels "north","south": 1 1 1 1 1 1 1 1 1 1 ...
 $ site  : Factor w/ 7 levels "a","b","c","d",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ resp  : num  0.1672 0.0733 0.1637 0.0836 0.1195 ...
 $ mass  : num  185 450 718 388 250 410 602 315 462 637 ...
 $ temp  : num  25.2 24.2 23.8 25.2 23.4 24.8 24 23.5 26.2 27.1 ...

> xtabs(~region+site, data=data) # unequal number of observations between sites
       site
region   a  b  c  d  e  f  g
  north 11 20  8  0  0  0  0
  south  0  0  0 10 24  4  2

> library(nlme)
> contrasts(data$site) <- c(-1/3, -1/3, -1/3, 1/4, 1/4, 1/4, 1/4) # test the difference between 3 northern and 4 southern sites, other contrasts may be of interest but I'm starting with this one.

> gls1 <- gls(resp ~ mass + temp + site, weights = varIdent(site), correlation = corCompSymm(form = ~ 1|site), method = "REML", data = data)

> summary(gls1)
Generalized least squares fit by REML
  Model: resp ~ mass + temp + site
  Data: data
   AIC  BIC logLik
  -231 -207    127

Correlation Structure: Compound symmetry
 Formula: ~1 | site
 Parameter estimate(s):
    Rho
3.8e-08

Coefficients:
              Value Std.Error t-value p-value
(Intercept)  0.1143    0.0442   2.587  0.0118
mass         0.0000    0.0000   1.072  0.2875
temp        -0.0004    0.0018  -0.215  0.8305
site1       -0.0538    0.0179  -3.008  0.0037
site2       -0.0061    0.0108  -0.563  0.5755
site3       -0.0122    0.0108  -1.127  0.2638
site4       -0.0039    0.0097  -0.403  0.6882
site5        0.0171    0.0153   1.122  0.2658
site6       -0.0038    0.0189  -0.204  0.8391

 Correlation:
      (Intr) mass   temp   site1  site2  site3  site4  site5
mass  -0.221
temp  -0.969 -0.001
site1 -0.318  0.443  0.267
site2 -0.350  0.199  0.329  0.051
site3 -0.065  0.110  0.003 -0.167  0.073
site4 -0.237 -0.112  0.207 -0.371  0.102  0.329
site5  0.399  0.121 -0.437  0.004 -0.084 -0.067 -0.100
site6 -0.077 -0.336  0.199  0.255  0.028 -0.387 -0.239 -0.497

Standardized residuals:
    Min      Q1     Med      Q3     Max
-2.4083 -0.5903 -0.0519  0.5830  2.0042

Residual standard error: 0.0305
Degrees of freedom: 79 total; 70 residual

I would interpret this as a difference between the two regions
(contrast = "site1") which is significantly different from 0.  The
effects of mass and temperature don't appear to be significanly
different from zero. Spread of the residuals is still somewhat greater
for some groups. I would welcome any thoughts on the suitability of
this approach. Thanks, Erik

########### reproducible code
data <-

structure(list(region = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("north", "south"), class =
"factor"),
    site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
    2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
    5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
    5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L), .Label = c("a", "b",
    "c", "d", "e", "f", "g"), class = "factor"), resp = c(0.1672,
    0.0733, 0.1637, 0.0836, 0.1195, 0.1855, 0.1502, 0.12, 0.1985,
    0.1083, 0.1522, 0.1387, 0.109, 0.0712, 0.1724, 0.1775, 0.1071,
    0.1641, 0.13, 0.1845, 0.1348, 0.1808, 0.1841, 0.0739, 0.0888,
    0.1421, 0.1883, 0.1305, 0.1173, 0.1875, 0.114, 0.1365, 0.1443,
    0.1601, 0.1272, 0.1313, 0.1269, 0.0903, 0.1218, 0.0533, 0.0963,
    0.0868, 0.1259, 0.086, 0.0828, 0.0792, 0.0818, 0.0878, 0.0843,
    0.0963, 0.1138, 0.084, 0.0698, 0.1098, 0.0848, 0.1077, 0.1458,
    0.0896, 0.1045, 0.0719, 0.0771, 0.0868, 0.0743, 0.0868, 0.092,
    0.0774, 0.1591, 0.0858, 0.0734, 0.1256, 0.1027, 0.0988, 0.1234,
    0.0896, 0.1202, 0.1134, 0.1311, 0.1398, 0.0674), mass = c(185,
    450, 718, 388, 250, 410, 602, 315, 462, 637, 861, 423, 764,
    806, 892, 630, 622, 595, 314, 586, 796, 502, 803, 440, 256,
    695, 476, 664, 530, 646, 745, 144, 449, 566, 398, 542, 568,
    198, 649, 160, 248, 244, 211, 259, 178, 230, 216, 254, 269,
    232, 282, 309, 504, 294, 544, 296, 378, 309, 364, 313, 352,
    507, 163, 422, 126, 345, 397, 288, 95, 384, 336, 192, 240,
    359, 133, 208, 101, 560, 509), temp = c(25.2, 24.2, 23.8,
    25.2, 23.4, 24.8, 24, 23.5, 26.2, 27.1, 23.7, 23.4, 25, 25.6,
    25.8, 27.2, 25.4, 25.6, 24.8, 23.6, 23, 24.6, 24.8, 23.2,
    23.8, 25.4, 24.8, 28, 26.6, 26.4, 23.9, 23.2, 23, 20.8, 21.4,
    21.2, 23.2, 23.4, 23.8, 23.8, 22.8, 23.6, 22.2, 21.6, 21.8,
    21.6, 22.8, 21.6, 22.4, 24.4, 20.4, 21, 22.4, 22.2, 22.4,
    22.8, 14.4, 14.2, 15.4, 21, 21.6, 21.4, 20.8, 21.4, 23, 21.3,
    22.6, 22.8, 22.4, 23.6, 22.8, 24.4, 23.8, 28.6, 28.6, 24.4,
    22.6, 21.2, 19.6)), .Names = c("region", "site", "resp",
"mass", "temp"), row.names = c(NA, 79L), class = "data.frame")

str(data)
xtabs(~region+site, data=data) # very unequal number of observations
between sites
library(nlme)
contrasts(data$site) <- c(-1/3, -1/3, -1/3, 1/4, 1/4, 1/4, 1/4) # test
the difference between 3 northern and 4 southern sites
gls1 <- gls(resp ~ mass + temp + site, weights = varIdent(site),
correlation = corCompSymm(form = ~ 1|site), method = "REML", data =
data)
summary(gls1)