[R-sig-ME] Question regarding MCMCglmm model structure for correlated responses

Mon Jul 18 19:08:08 CEST 2016

Dear List,

I need to compare two systems ("Bulk" and "DVS"), each consisting of a number of "Cultures" (e.g. Bulk: B1 - B5 and DVS: A - E). I have repeated measurements of each culture in each system (more from the Bulk system than the DVS system) with regards to pH and composition variables. The composition variables: Moi, Fat, NFS, are highly correlated because they need to add to 100%. pH may also be correlated to the composition variables.

Furthermore, I assume that the residual variance of all composition variables can be estimated together within each system (sigma comp). And I assume, that the residual variance will be the same for each culture within a system, but not between systems.  

I would like to answer questions such as:
1) Are the performance of the systems the same? (i.e. do Bulk and DVS systems have the same mean and culture variance within each response variable)?
2) Are the performance of the cultures within each system the same? (i.e. do they have the same mean, assuming the variance to be the same) 
3) Are the responses correlated and by how much?

I assume that I can use MCMCglmm to do the analysis, as it handles multiple responses and marginal contrasts (means and variance) are straight forward to calculate from the MCMC chains. But I’am unsure how to specify the fixed, random and residual covariance in the function interface (e.g. how do I tell the function to estimate the same variance for the composition variables?).

To make my request more specific, I add a synthetic data set that can be used for the analysis:

# Packages
library(dplyr)

# Design
df_sim <- data.frame(System = c(rep("Bulk", 75), rep("DVS", 25)), Culture = c(rep(c("B1", "B2", "B3", "B4", "B5"), each = 15), rep(c("A", "B", "C", "D", "E"), each = 5)))
df_sim$Culture <- ordered(df_sim$Culture, levels = c("B1", "B2", "B3", "B4", "B5", "A", "B", "C", "D", "E"))

# Responses
set.seed(1905) # start from the same place everytime
df_sim <- df_sim %>%
  mutate(pH = c(rep(rnorm(n = 5, mean = 5.3, sd = 0.1), each = 15), # Bulk 0.1 pHU higher than DVS on average
                rep(rnorm(n = 5, mean = 5.2, sd = 0.05), each = 5)), # DVS between cultures more consistent
         pH = c(rnorm(n = 75, mean = pH, sd = 0.1), # Bulk have larger within culture variation
                rnorm(n = 25, mean = pH, sd = 0.05)), # DVS have smaller within culture variation
         Moi = rep(rnorm(n = 5, mean = 40, sd = 0.1), each = 20), # all cultures, regardless of system, are on average identical
         Moi = rnorm(n = 100, mean = Moi, sd = 0.05), # and with same noise
         Fat = rep(rnorm(n = 5, mean = 25, sd = 0.01), each = 20), # all cultures are on average identical
         Fat = rnorm(n = 100, mean = Fat, sd = 0.05), # and with same noise
         NFS = 100 - Moi - Fat # all must sum to 100%
  )

My first attempt has been the following:

MCMCglmm(fixed = cbind(pH, Moi, Fat, NFS) ~ trait*Culture - 1,
                 family = rep("gaussian", 4),
                 rcov = ~us(System:trait):units,
                 data = df_sim,
                 nitt = 10000,
                 thin = 10,
                 burnin = 3000,
                 verbose = FALSE)

But this does not take into account the Moi, Fat, NFS used to estimate a common composition residual. 
Is there a way to specify this in the rcov formula? Am I using the right family for the residuals, when i know the are correlated 100 = Moi + Fat + NFS? Is MCMCglmm the right tool or am I asking to much of the package?

Any help would be appreciated.

Thanks

Jannik