[R-sig-ME] Advice on GLS model specification
Erik Frenzel
erikfrenzel at gmail.com
Sun Jan 15 18:10:43 CET 2012
Hello mixed modellers:
I'm helping a coworker analyze some toxicology data and would
appreciate any feedback on my proposed analysis, particularly in
regards to how I've specified a contrast.
He collected 1 blood sample from each of 79 animals at 7 sites in 2
regions. He would like to know the difference in the response variable
(concentration of a chemical in the blood) between regions after
accounting for two covariates (mass and temperature) which were
measured for each animal. The data are unbalanced and the variance of
the response variable also differs between sites.
My plan is to fit a GLS model with fixed effects of the two covariates
and site, specify the correlation and variance structures, and test
for the difference between regions using a contrast between the sites
in two regions. Reproducible code with data from dput() is at the end
of the message.
> str(data) # each record is a blood sample
'data.frame': 79 obs. of 5 variables:
$ region: Factor w/ 2 levels "north","south": 1 1 1 1 1 1 1 1 1 1 ...
$ site : Factor w/ 7 levels "a","b","c","d",..: 1 1 1 1 1 1 1 1 1 1 ...
$ resp : num 0.1672 0.0733 0.1637 0.0836 0.1195 ...
$ mass : num 185 450 718 388 250 410 602 315 462 637 ...
$ temp : num 25.2 24.2 23.8 25.2 23.4 24.8 24 23.5 26.2 27.1 ...
> xtabs(~region+site, data=data) # unequal number of observations between sites
site
region a b c d e f g
north 11 20 8 0 0 0 0
south 0 0 0 10 24 4 2
> library(nlme)
> contrasts(data$site) <- c(-1/3, -1/3, -1/3, 1/4, 1/4, 1/4, 1/4) # test the difference between 3 northern and 4 southern sites, other contrasts may be of interest but I'm starting with this one.
> gls1 <- gls(resp ~ mass + temp + site, weights = varIdent(site), correlation = corCompSymm(form = ~ 1|site), method = "REML", data = data)
> summary(gls1)
Generalized least squares fit by REML
Model: resp ~ mass + temp + site
Data: data
AIC BIC logLik
-231 -207 127
Correlation Structure: Compound symmetry
Formula: ~1 | site
Parameter estimate(s):
Rho
3.8e-08
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.1143 0.0442 2.587 0.0118
mass 0.0000 0.0000 1.072 0.2875
temp -0.0004 0.0018 -0.215 0.8305
site1 -0.0538 0.0179 -3.008 0.0037
site2 -0.0061 0.0108 -0.563 0.5755
site3 -0.0122 0.0108 -1.127 0.2638
site4 -0.0039 0.0097 -0.403 0.6882
site5 0.0171 0.0153 1.122 0.2658
site6 -0.0038 0.0189 -0.204 0.8391
Correlation:
(Intr) mass temp site1 site2 site3 site4 site5
mass -0.221
temp -0.969 -0.001
site1 -0.318 0.443 0.267
site2 -0.350 0.199 0.329 0.051
site3 -0.065 0.110 0.003 -0.167 0.073
site4 -0.237 -0.112 0.207 -0.371 0.102 0.329
site5 0.399 0.121 -0.437 0.004 -0.084 -0.067 -0.100
site6 -0.077 -0.336 0.199 0.255 0.028 -0.387 -0.239 -0.497
Standardized residuals:
Min Q1 Med Q3 Max
-2.4083 -0.5903 -0.0519 0.5830 2.0042
Residual standard error: 0.0305
Degrees of freedom: 79 total; 70 residual
I would interpret this as a difference between the two regions
(contrast = "site1") which is significantly different from 0. The
effects of mass and temperature don't appear to be significanly
different from zero. Spread of the residuals is still somewhat greater
for some groups. I would welcome any thoughts on the suitability of
this approach. Thanks, Erik
########### reproducible code
data <-
structure(list(region = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("north", "south"), class =
"factor"),
site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L), .Label = c("a", "b",
"c", "d", "e", "f", "g"), class = "factor"), resp = c(0.1672,
0.0733, 0.1637, 0.0836, 0.1195, 0.1855, 0.1502, 0.12, 0.1985,
0.1083, 0.1522, 0.1387, 0.109, 0.0712, 0.1724, 0.1775, 0.1071,
0.1641, 0.13, 0.1845, 0.1348, 0.1808, 0.1841, 0.0739, 0.0888,
0.1421, 0.1883, 0.1305, 0.1173, 0.1875, 0.114, 0.1365, 0.1443,
0.1601, 0.1272, 0.1313, 0.1269, 0.0903, 0.1218, 0.0533, 0.0963,
0.0868, 0.1259, 0.086, 0.0828, 0.0792, 0.0818, 0.0878, 0.0843,
0.0963, 0.1138, 0.084, 0.0698, 0.1098, 0.0848, 0.1077, 0.1458,
0.0896, 0.1045, 0.0719, 0.0771, 0.0868, 0.0743, 0.0868, 0.092,
0.0774, 0.1591, 0.0858, 0.0734, 0.1256, 0.1027, 0.0988, 0.1234,
0.0896, 0.1202, 0.1134, 0.1311, 0.1398, 0.0674), mass = c(185,
450, 718, 388, 250, 410, 602, 315, 462, 637, 861, 423, 764,
806, 892, 630, 622, 595, 314, 586, 796, 502, 803, 440, 256,
695, 476, 664, 530, 646, 745, 144, 449, 566, 398, 542, 568,
198, 649, 160, 248, 244, 211, 259, 178, 230, 216, 254, 269,
232, 282, 309, 504, 294, 544, 296, 378, 309, 364, 313, 352,
507, 163, 422, 126, 345, 397, 288, 95, 384, 336, 192, 240,
359, 133, 208, 101, 560, 509), temp = c(25.2, 24.2, 23.8,
25.2, 23.4, 24.8, 24, 23.5, 26.2, 27.1, 23.7, 23.4, 25, 25.6,
25.8, 27.2, 25.4, 25.6, 24.8, 23.6, 23, 24.6, 24.8, 23.2,
23.8, 25.4, 24.8, 28, 26.6, 26.4, 23.9, 23.2, 23, 20.8, 21.4,
21.2, 23.2, 23.4, 23.8, 23.8, 22.8, 23.6, 22.2, 21.6, 21.8,
21.6, 22.8, 21.6, 22.4, 24.4, 20.4, 21, 22.4, 22.2, 22.4,
22.8, 14.4, 14.2, 15.4, 21, 21.6, 21.4, 20.8, 21.4, 23, 21.3,
22.6, 22.8, 22.4, 23.6, 22.8, 24.4, 23.8, 28.6, 28.6, 24.4,
22.6, 21.2, 19.6)), .Names = c("region", "site", "resp",
"mass", "temp"), row.names = c(NA, 79L), class = "data.frame")
str(data)
xtabs(~region+site, data=data) # very unequal number of observations
between sites
library(nlme)
contrasts(data$site) <- c(-1/3, -1/3, -1/3, 1/4, 1/4, 1/4, 1/4) # test
the difference between 3 northern and 4 southern sites
gls1 <- gls(resp ~ mass + temp + site, weights = varIdent(site),
correlation = corCompSymm(form = ~ 1|site), method = "REML", data =
data)
summary(gls1)
More information about the R-sig-mixed-models
mailing list