[R-sig-ME] Differing variable lengths (missing data) and Model errors in lmer()

Fri Jul 10 15:31:24 CEST 2009

Dear Aditi,

Would'nt it be simpler then to remove the family effect (by using e.g. 
the residuals of lmer(peg.no~1+(1|Family))) and to look at the effect of 
individual loci on the residuals (I think this is what was suggested by 
Aulchenko etal (Genetics 2007  177:577)? But I may be wrong and missing 
something...

Best wishes,

A Singh wrote:
> Dear Jerome,
>
> Thank you very much for your suggestion, and apologies for the late 
> response..
>
> I have tried exactly the same approach when I was still going through 
> this in my preliminary stages, and I've tried to run lmer() using them 
> as fixed effects; however the catch here, as my supervisors explained 
> to me is this-
> - the alleles are not fixed effects because we don't just want to see 
> if they differ significantly in their presence/absence across all 60 
> families.
>
> We want to see if they show associations with phenotype-measurements 
> within each family so that we can see if they account for the 
> unexplained variation within each family, if that makes sense? And 
> this makes them random effects.
>
> We are trying to (as a larger goal) look for marker-trait 
> associations, map QTL's and  identify fragments of genes responsible 
> for speciation/genes under selection.
> Nesting the markers within each family is a sort of mini-within-family 
> ANOVA with random effects to see whether particular families have a 
> phenotypic trait more accounted for by a particular marker/combination 
> of markers, than the trait may be accounted for in other families 
> based on the same markers.
>
> Working on this principle, I need to run my lmer model they way I was 
> trying to, using the for loops to nest markers within families so that 
> for each phenotype, I would have (no of families*no of markers) runs..
>
> Hope this makes it clearer...
>
> Thanks again,
>
> Aditi
>
>
>
>
> --On 09 July 2009 21:21 +0200 Jerome Goudet <jerome.goudet at unil.ch> 
> wrote:
>
>> Dear Aditi,
>>
>> I'm puzzled with the nesting of loci within families (the presence of an
>> allele at a locus means the same whatever the family, I would think). I
>> would treat loci as fixed effect and hence use something like that:
>>
>>  (fm1<-lmer(peg.no~P1L55+P1L73+P1L74+P1L77+P1L91+P1L96+P1L98+P1L100+P1L11 
>>
>> 4+P1L118+(1|family),dat))
>> Linear mixed model fit by REML
>> Formula: peg.no ~ P1L55 + P1L73 + P1L74 + P1L77 + P1L91 + P1L96 + 
>> P1L98 +
>> P1L100 + P1L114 + P1L118 + (1 | family)
>>    Data: dat
>>   AIC  BIC logLik deviance REMLdev
>>  2954 3006  -1464     2963    2928
>> Random effects:
>>  Groups   Name        Variance Std.Dev.
>>  family   (Intercept) 89.043   9.4363  Residual             91.377
>> 9.5592 Number of obs: 390, groups: family, 57
>>
>> Fixed effects:
>>             Estimate Std. Error t value
>> (Intercept)  96.2793     3.2169  29.929
>> P1L55        -2.0185     1.5831  -1.275
>> P1L73        -3.0256     1.8900  -1.601
>> P1L74         1.5112     1.6189   0.933
>> P1L77        -2.7898     2.3529  -1.186
>> P1L91         0.6480     3.3118   0.196
>> P1L96        -1.4718     1.4938  -0.985
>> P1L98         2.6431     2.6401   1.001
>> P1L100       -3.4529     2.0842  -1.657
>> P1L114        0.8099     1.9218   0.421
>> P1L118       -2.7119     2.3635  -1.147
>>
>>
>> But perhaps I missed something?
>>
>> Best wishes,
>>
>> A Singh wrote:
>>> Dear All,
>>>
>>> I am trying to run a nested random effects model in lmer (for R 2.9.1,
>>> lme4 version 0.999375-31 ) using data which is structured as follows:
>>>
>>> family offsp. P1L74 P1L77 P1L91 P1L96..(n=426) peg.no ec.length
>>> syll.length
>>> 1        2      1      0      0      0                86
>>> 5.445         2.479
>>> 1        3      1      0      0      0                91
>>> 5.215         2.356
>>> 1        4      0      0      0      0                  79
>>> 5.682         2.896
>>> 1        5      1      0      0      0               83
>>> 5.149         2.641
>>> 1        6      0      0      0      0                77
>>> 5.044         2.288
>>> 1        7      1      0      0      0                  78
>>> 5.450         2.420
>>> 1        8      1      0      1      1                82
>>> 5.377         2.505
>>> 1        9      1      0      0      0                95
>>> 5.389         2.706
>>> 1       10      1      0      0      0                88
>>> 5.354         2.461
>>> 1       11      1      0      0      0                87
>>> 5.262         3.079
>>> 1       12      1      0      0      0                84
>>> 5.191         2.858
>>> 1       13      1      0      0      1                  87
>>> 5.194         2.264
>>> 2       23      1      0      0      1                  116
>>> 5.863         2.876
>>> 2       24      1      0      0      0                122
>>> 5.475         3.114
>>> 2       25      1      0      0      0                110
>>> 5.563         3.059
>>> .       .       .   .     .     .               .         .          .
>>> .       .       .   .     .     .               .         .          .
>>> .
>>> .
>>> (60 families)
>>>
>>> 'Family' is the first Random effect (categorical variable), with 60
>>> levels.
>>>
>>> All columns labeled P1L(x) are a matrix of presence/absence genetic
>>> markers for each individual in each family. There are 426 such
>>> columns(not numbered in sequence) and each one is a random effect.
>>>
>>> The last three columns (peg.no, ec.length and syll.length) are the
>>> three dependent variables.
>>>
>>> Each genetic marker column needs to be nested within each family,
>>> which means that if I take the first phenotype, peg.no, for example,
>>> then I need to run an analysis that partitions variance 60*426 times
>>> (~25,00 runs) for that phenotype.
>>> I also need an output with the Anova table, and summary, at each stage
>>> of the run, so that I can get P values for each marker for each
>>> family, as a way of determining whether it contributes significantly
>>> to explaining within-family variance for that trait.
>>>
>>> To do the above, I tried to write code for lmer() using two nested
>>> 'for' loops, one for each level of random factor nesting (marker
>>> within family) as follows (using a test data set [please find
>>> attached] with only the first 10 marker columns, to see if this works):
>>>
>>>
>>>> vc<-read.table("P:\\R\\Testvcomp10.txt",header=T)
>>>> attach(vc)
>>>
>>>> family<-factor(family)
>>>> colms<-(vc)[,4:13] ## this to assign the 10 columns containing marker
>>> data    to a new variable, as column names are themselves not in any
>>> recognisable sequence
>>>
>>>> vcdf<-data.frame(family,peg.no,ec.length,syll.length,colms)
>>>> library(lme4)
>>>
>>>> for (c in levels(family))
>>> + {    for (i in 1:length(colms))
>>> +        { fit<-lmer(peg.no~1 + (1|c/i), vcdf)
>>> +        }
>>> +    summ<-summary(fit)
>>> +    av<-anova(fit)
>>> +    print(summ)
>>> +    print(av)
>>> + }
>>>
>>> This gives me:
>>>
>>> Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1 
>>> +  :
>>>  variable lengths differ (found for 'c')
>>>
>>>
>>>
>>> On suggestion from a colleague I reframed this as:
>>>
>>>
>>>> for(c in levels(family))
>>> + {
>>> + print("----New C:----")
>>> + print(c)
>>> + for (i in 1:length(colms))
>>> + {
>>> + fit<-lmer(peg.no~1+ (1|c/i),vcdf)
>>> + print(i)
>>> + summ<-summary(fit)
>>> + av<-anova(fit)
>>> + print(summ)
>>> + print(av)
>>> + }
>>> + }
>>>
>>> ..and this gave me the output:
>>>
>>> [1] "----New C:----"
>>> [1] "1"
>>> Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1 
>>> +  :
>>>  variable lengths differ (found for 'c')
>>>
>>>
>>> Google-ing the error message has led me to plenty of links that
>>> suggest forcing data into a data frame to fix this, but that hasn't
>>> worked.
>>> My markers and phenotypes both have plenty of missing data (NA's in
>>> the data.frame), and na.action=na.omit isn't solving the problem.
>>>
>>> (I have tried this with lme, and tried to do it with the aov() command
>>> as well and the error is pretty much the same).
>>>
>>> I am completely new to R, and despite searching and trying various
>>> things, can't get the code to work.
>>>
>>> I really appreciate any corrections to this code, or alternative
>>> command/function suggestions that I can look into, to try to do this
>>> again.
>>>
>>>
>>> Thanks a bunch for your help,
>>>
>>> Aditi
>>>
>>>
>>> ----------------------
>>> A Singh
>>> Aditi.Singh at bristol.ac.uk
>>> School of Biological Sciences
>>> University of Bristol
>>>
>>>
>>>
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> -- 
>> Jérôme Goudet
>> Dept. Ecology & Evolution
>> UNIL-Sorge, CH-1015 Lausanne
>>
>> mail:jerome.goudet at unil.ch
>> Tel:+41 (0)21 692 4242
>> Fax:+41 (0)21 692 4265
>>
>
>
>
> ----------------------
> A Singh
> Aditi.Singh at bristol.ac.uk
> School of Biological Sciences
> University of Bristol
>
>
>
>
>

-- 
Jérôme Goudet
Dept. Ecology & Evolution
UNIL-Sorge, CH-1015 Lausanne

mail:jerome.goudet at unil.ch
Tel:+41 (0)21 692 4242
Fax:+41 (0)21 692 4265