[R-sig-ME] MCMCglmm model with 2 datasets

Ingleby, Fiona fci201 at exeter.ac.uk
Thu May 9 10:49:55 CEST 2013


Hi David,

This is Drosophila data, so I'm using 'family' to refer to a line, such that it's just a factor of line IDs and all individuals within a line have been bred to share most of their genes with each other.

I suppose my worry with adding in the NA values is that I don't understand how MCMCglmm deals with these. So if I start with, for example, the first dataset with 4 traits measured on 10 males and 10 females from each line, then I take the second dataset with 1 trait measured in 5 different males and 5 different females from each of these lines, and to get this to the same size I have to add in 5 male and 5 female NAs for trait 5 into each line. Does R read these NA values as being associated with a particular row of data for traits 1-4? This isn't strictly speaking true since it's completely arbitrary which individuals within each line get 'assigned' (in terms of how the rows of the two datasets match up with each other) either a trait 5 value or an NA value. I don't really understand how the model deals with the NA values and so I don't see if/how this matters. Am I worrying about nothing?! One of the other approaches I tried in order to avoid adding NAs was to take line averages of all 5 traits and run the same model, and this ended up with very different results from the analysis of the individual data with the NAs, so I'm confused as to which is the right approach and why. If you (or anyone else) have any thoughts on this, I'd be really grateful.

Thanks again for your help,

Fiona


Dr Fiona C Ingleby
Postdoctoral Research Fellow 
University of Sussex
Email: F.Ingleby at sussex.ac.uk
Website: fionaingleby.weebly.com


On 8 May 2013, at 22:11, David Duffy <David.Duffy at qimr.edu.au> wrote:

On Thu, 9 May 2013, Ingleby, Fiona wrote:

I have two datasets, one with four traits measured in male and female individuals from a set of families, and the second dataset with another trait measured in males and females from the same set of families, but using different individuals (and different sample sizes) from those families.

prior <- list( R=list(V=diag(5)/5,nu=0.5), G=list(G1=list(V=diag(10)/10,nu=0.5)) )
model <- MCMCglmm( cbind(trait1, trait2,trait3,trait4, TRAIT5) ~ sex:trait-1,random=~us(sex:trait):family,
           rcov=~us(trait):units,prior=prior,data=data,family=rep("gaussian",5),
           nitt=400000,burnin=20000,thin=25,pr=T)

Which of course wouldn't be a problem if (a) the datasets were the same length, and (b) the data for the 5 traits had been measured on the same individuals. Since they are not, I'm left with two problems.

I don't want to estimate the individual level covariances between traits, since they are measured on different individuals.

You actually do want to estimate these, otherwise you can't adjust T1-T4 for T5.

My second problem is of course the different lengths of datasets. I tried to get around this by filling in the shorter dataset with NA missing data values within each family

Yes, this is want you need to do.  Consider two traits that are sex-specific eg human prostate and breast cancer.  So each individual will be missing for one or the other, but between-family differences in co-occurrence of the two diseases allow us to assess if there is a shared genetic susceptibility.  This we model as a within-individual correlation in susceptbility to each trait, even if both cannot simultaneously be instantiated. Mechanistically, the same gene has sex-specific expression.

What structure do your families have?

Cheers, David.

| David Duffy (MBBS PhD)                                         ,-_|\
| email: davidD at qimr.edu.au  ph: INT+61+7+3362-0217 fax: -0101  /     *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v



More information about the R-sig-mixed-models mailing list