[R-sig-ME] ICCs with Multiple Variables

Mon Aug 29 14:51:32 CEST 2011

Dear All,

Suppose I have measured some variable in siblings, which are nested within families. The data structure looks like this:

family sibling value
------ ------- -----
1      1       y_11
1      2       y_12
2      1       y_21
2      2       y_22
2      3       y_23
...    ...     ...

I want to know the correlation between measurements taken on siblings within the same family. The usual way of doing that is to calculate the ICC based on a random-intercept model:

res <- lme(yij ~ 1, random = ~ 1 | family, data=dat)
getVarCov(res)[[1]] / (getVarCov(res)[[1]] + res$s^2)

If I only had pairs of siblings within families, this would be equivalent to:

res <- gls(yij ~ 1, correlation = corCompSymm(form = ~ 1 | family), data=dat)

except that the latter approach also allows for a negative ICC.

Now suppose I have measured three items in siblings nested within families. So, the data structure looks like this:

family sibling item value
------ ------- ---- -----
1      1       1    y_111
1      1       2    y_112
1      1       3    y_113
1      2       1    y_121
1      2       2    y_122
1      2       3    y_123
2      1       1    y_211
2      1       2    y_212
2      1       3    y_213
2      2       1    y_221
2      2       2    y_222
2      2       3    y_223
2      3       1    y_231
2      3       2    y_232
2      3       3    y_233
...    ...     ...  ...

Now, I want to find out about:

1) the correlation between measurements taken on siblings within the same family for the same item
2) the correlation between measurements taken on siblings within the same family for different items

If I only had pairs of siblings within families, I would just do:

res <- gls(yijk ~ item, correlation = corSymm(form = ~ 1 | family), weights = varIdent(form = ~ 1 | item), data=dat)

which gives me a 6x6 var-cov matrix on the residuals of the form: 

[s^2_1         r_12 s_1 s_2  r_13 s_1 s_3  |  ICC_11 s^2_1    ICC_12 s_1 s_2  ICC_13 s_1 s_3]
[r_12 s_1 s_2  s^2_2         r_23 s_2 s_3  |  ICC_12 s_1 s_2  ICC_22 s^2_2    ICC_23 s_2 s_3]
[r_13 s_1 s_3  r_23 s_2 s_3  s^2_3         |  ICC_13 s_1 s_3  ICC_23 s_2 s_3  ICC_33 s^2_3  ]
[------------------------------------------------+------------------------------------------]
[                                          |  s^2_1           r_12 s_1 s_2    r_13 s_1 s_3  ]
[                                          |  r_12 s_1 s_2    s^2_2           r_23 s_2 s_3  ]
[                                          |  r_13 s_1 s_3    r_23 s_2 s_3    s^2_3         ]

based on which I could easily estimate those cross-sibling correlations. However, as shown above, for some families, I have only two siblings, but for other families more than two. So, that makes me think that I need to get back to a variance-components type of model. However, the correlation between items may be negative, so I do not want to use a model that constraints the correlations to be positive.

Any ideas/suggestions of how I could approach this?

Thanks in advance for any help!

Best,

Wolfgang