[R-sig-ME] MCMCglmm model with 2 datasets

Wed May 8 17:15:03 CEST 2013

Hi everyone,

I posted a similar problem a few weeks ago, so I apologise for some repetition. In response to my previous email, someone helped me simplify my problem a bit, but unfortunately I'm still having difficulty so I'd really appreciate any further help anyone can offer.

I have two datasets, one with four traits measured in male and female individuals from a set of families, and the second dataset with another trait measured in males and females from the same set of families, but using different individuals (and different sample sizes) from those families.

I started with this model, using only the data from the larger dataset with four traits:

prior <- list( R=list(V=diag(4)/4,nu=0.5), G=list(G1=list(V=diag(8)/8,nu=0.5)) )
model <- MCMCglmm( cbind(trait1, trait2,trait3,trait4) ~ sex:trait-1,random=~us(sex:trait):family,
             rcov=~us(trait):units,prior=prior,data=data,family=rep("gaussian",4),
             nitt=400000,burnin=20000,thin=25,pr=T)

So far, pretty straightforward. My problem is that I would like to be able, if possible, to analyse this data along with the fifth trait from the second dataset. So I'm thinking along these lines:

prior <- list( R=list(V=diag(5)/5,nu=0.5), G=list(G1=list(V=diag(10)/10,nu=0.5)) )
model <- MCMCglmm( cbind(trait1, trait2,trait3,trait4, TRAIT5) ~ sex:trait-1,random=~us(sex:trait):family,
             rcov=~us(trait):units,prior=prior,data=data,family=rep("gaussian",5),
             nitt=400000,burnin=20000,thin=25,pr=T)

Which of course wouldn't be a problem if (a) the datasets were the same length, and (b) the data for the 5 traits had been measured on the same individuals. Since they are not, I'm left with two problems. 

I don't want to estimate the individual level covariances between traits, since they are measured on different individuals. But if I try to specify a model without the individual-level ('units') term, then I get an error message stating that unique residuals have not been defined for each datapoint. Is there a way of doing this?

My second problem is of course the different lengths of datasets. I tried to get around this by filling in the shorter dataset with NA missing data values within each family, but I'm not sure that this is a good idea, as the missing data values are read as if they are associated with the corresponding row of the bigger dataset and I'm worried this will bias the results. If anyone can think of a different way of going about this then I would be very grateful.

Thanks again,

Fiona