[R-sig-ME] Differing variable lengths (missing data) and Model errors in lmer()

A Singh Aditi.Singh at bristol.ac.uk
Fri Jul 10 16:23:50 CEST 2009


Dear Jerome,

Thank you very much indeed for your suggestion!
I have just gotten the paper off Genetics, and it seems to be very spot on. 
I will have a quick read, try to isolate residuals from my lmer() formula 
using 'family' and  will get back to you with what it gives me. This could 
be my key.

Thanks again,

Aditi



--On 10 July 2009 15:31 +0200 Jerome Goudet <jerome.goudet at unil.ch> wrote:

> Dear Aditi,
>
> Would'nt it be simpler then to remove the family effect (by using e.g.
> the residuals of lmer(peg.no~1+(1|Family))) and to look at the effect of
> individual loci on the residuals (I think this is what was suggested by
> Aulchenko etal (Genetics 2007  177:577)? But I may be wrong and missing
> something...
>
> Best wishes,
>
> A Singh wrote:
>> Dear Jerome,
>>
>> Thank you very much for your suggestion, and apologies for the late
>> response..
>>
>> I have tried exactly the same approach when I was still going through
>> this in my preliminary stages, and I've tried to run lmer() using them
>> as fixed effects; however the catch here, as my supervisors explained
>> to me is this-
>> - the alleles are not fixed effects because we don't just want to see
>> if they differ significantly in their presence/absence across all 60
>> families.
>>
>> We want to see if they show associations with phenotype-measurements
>> within each family so that we can see if they account for the
>> unexplained variation within each family, if that makes sense? And
>> this makes them random effects.
>>
>> We are trying to (as a larger goal) look for marker-trait
>> associations, map QTL's and  identify fragments of genes responsible
>> for speciation/genes under selection.
>> Nesting the markers within each family is a sort of mini-within-family
>> ANOVA with random effects to see whether particular families have a
>> phenotypic trait more accounted for by a particular marker/combination
>> of markers, than the trait may be accounted for in other families
>> based on the same markers.
>>
>> Working on this principle, I need to run my lmer model they way I was
>> trying to, using the for loops to nest markers within families so that
>> for each phenotype, I would have (no of families*no of markers) runs..
>>
>> Hope this makes it clearer...
>>
>> Thanks again,
>>
>> Aditi
>>
>>
>>
>>
>> --On 09 July 2009 21:21 +0200 Jerome Goudet <jerome.goudet at unil.ch>
>> wrote:
>>
>>> Dear Aditi,
>>>
>>> I'm puzzled with the nesting of loci within families (the presence of an
>>> allele at a locus means the same whatever the family, I would think). I
>>> would treat loci as fixed effect and hence use something like that:
>>>
>>>  (fm1<-lmer(peg.no~P1L55+P1L73+P1L74+P1L77+P1L91+P1L96+P1L98+P1L100+P1L
>>>  11
>>>
>>> 4+P1L118+(1|family),dat))
>>> Linear mixed model fit by REML
>>> Formula: peg.no ~ P1L55 + P1L73 + P1L74 + P1L77 + P1L91 + P1L96 +
>>> P1L98 +
>>> P1L100 + P1L114 + P1L118 + (1 | family)
>>>    Data: dat
>>>   AIC  BIC logLik deviance REMLdev
>>>  2954 3006  -1464     2963    2928
>>> Random effects:
>>>  Groups   Name        Variance Std.Dev.
>>>  family   (Intercept) 89.043   9.4363  Residual             91.377
>>> 9.5592 Number of obs: 390, groups: family, 57
>>>
>>> Fixed effects:
>>>             Estimate Std. Error t value
>>> (Intercept)  96.2793     3.2169  29.929
>>> P1L55        -2.0185     1.5831  -1.275
>>> P1L73        -3.0256     1.8900  -1.601
>>> P1L74         1.5112     1.6189   0.933
>>> P1L77        -2.7898     2.3529  -1.186
>>> P1L91         0.6480     3.3118   0.196
>>> P1L96        -1.4718     1.4938  -0.985
>>> P1L98         2.6431     2.6401   1.001
>>> P1L100       -3.4529     2.0842  -1.657
>>> P1L114        0.8099     1.9218   0.421
>>> P1L118       -2.7119     2.3635  -1.147
>>>
>>>
>>> But perhaps I missed something?
>>>
>>> Best wishes,
>>>
>>> A Singh wrote:
>>>> Dear All,
>>>>
>>>> I am trying to run a nested random effects model in lmer (for R 2.9.1,
>>>> lme4 version 0.999375-31 ) using data which is structured as follows:
>>>>
>>>> family offsp. P1L74 P1L77 P1L91 P1L96..(n=426) peg.no ec.length
>>>> syll.length
>>>> 1        2      1      0      0      0                86
>>>> 5.445         2.479
>>>> 1        3      1      0      0      0                91
>>>> 5.215         2.356
>>>> 1        4      0      0      0      0                  79
>>>> 5.682         2.896
>>>> 1        5      1      0      0      0               83
>>>> 5.149         2.641
>>>> 1        6      0      0      0      0                77
>>>> 5.044         2.288
>>>> 1        7      1      0      0      0                  78
>>>> 5.450         2.420
>>>> 1        8      1      0      1      1                82
>>>> 5.377         2.505
>>>> 1        9      1      0      0      0                95
>>>> 5.389         2.706
>>>> 1       10      1      0      0      0                88
>>>> 5.354         2.461
>>>> 1       11      1      0      0      0                87
>>>> 5.262         3.079
>>>> 1       12      1      0      0      0                84
>>>> 5.191         2.858
>>>> 1       13      1      0      0      1                  87
>>>> 5.194         2.264
>>>> 2       23      1      0      0      1                  116
>>>> 5.863         2.876
>>>> 2       24      1      0      0      0                122
>>>> 5.475         3.114
>>>> 2       25      1      0      0      0                110
>>>> 5.563         3.059
>>>> .       .       .   .     .     .               .         .          .
>>>> .       .       .   .     .     .               .         .          .
>>>> .
>>>> .
>>>> (60 families)
>>>>
>>>> 'Family' is the first Random effect (categorical variable), with 60
>>>> levels.
>>>>
>>>> All columns labeled P1L(x) are a matrix of presence/absence genetic
>>>> markers for each individual in each family. There are 426 such
>>>> columns(not numbered in sequence) and each one is a random effect.
>>>>
>>>> The last three columns (peg.no, ec.length and syll.length) are the
>>>> three dependent variables.
>>>>
>>>> Each genetic marker column needs to be nested within each family,
>>>> which means that if I take the first phenotype, peg.no, for example,
>>>> then I need to run an analysis that partitions variance 60*426 times
>>>> (~25,00 runs) for that phenotype.
>>>> I also need an output with the Anova table, and summary, at each stage
>>>> of the run, so that I can get P values for each marker for each
>>>> family, as a way of determining whether it contributes significantly
>>>> to explaining within-family variance for that trait.
>>>>
>>>> To do the above, I tried to write code for lmer() using two nested
>>>> 'for' loops, one for each level of random factor nesting (marker
>>>> within family) as follows (using a test data set [please find
>>>> attached] with only the first 10 marker columns, to see if this works):
>>>>
>>>>
>>>>> vc<-read.table("P:\\R\\Testvcomp10.txt",header=T)
>>>>> attach(vc)
>>>>
>>>>> family<-factor(family)
>>>>> colms<-(vc)[,4:13] ## this to assign the 10 columns containing marker
>>>> data    to a new variable, as column names are themselves not in any
>>>> recognisable sequence
>>>>
>>>>> vcdf<-data.frame(family,peg.no,ec.length,syll.length,colms)
>>>>> library(lme4)
>>>>
>>>>> for (c in levels(family))
>>>> + {    for (i in 1:length(colms))
>>>> +        { fit<-lmer(peg.no~1 + (1|c/i), vcdf)
>>>> +        }
>>>> +    summ<-summary(fit)
>>>> +    av<-anova(fit)
>>>> +    print(summ)
>>>> +    print(av)
>>>> + }
>>>>
>>>> This gives me:
>>>>
>>>> Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1
>>>> +  :
>>>>  variable lengths differ (found for 'c')
>>>>
>>>>
>>>>
>>>> On suggestion from a colleague I reframed this as:
>>>>
>>>>
>>>>> for(c in levels(family))
>>>> + {
>>>> + print("----New C:----")
>>>> + print(c)
>>>> + for (i in 1:length(colms))
>>>> + {
>>>> + fit<-lmer(peg.no~1+ (1|c/i),vcdf)
>>>> + print(i)
>>>> + summ<-summary(fit)
>>>> + av<-anova(fit)
>>>> + print(summ)
>>>> + print(av)
>>>> + }
>>>> + }
>>>>
>>>> ..and this gave me the output:
>>>>
>>>> [1] "----New C:----"
>>>> [1] "1"
>>>> Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1
>>>> +  :
>>>>  variable lengths differ (found for 'c')
>>>>
>>>>
>>>> Google-ing the error message has led me to plenty of links that
>>>> suggest forcing data into a data frame to fix this, but that hasn't
>>>> worked.
>>>> My markers and phenotypes both have plenty of missing data (NA's in
>>>> the data.frame), and na.action=na.omit isn't solving the problem.
>>>>
>>>> (I have tried this with lme, and tried to do it with the aov() command
>>>> as well and the error is pretty much the same).
>>>>
>>>> I am completely new to R, and despite searching and trying various
>>>> things, can't get the code to work.
>>>>
>>>> I really appreciate any corrections to this code, or alternative
>>>> command/function suggestions that I can look into, to try to do this
>>>> again.
>>>>
>>>>
>>>> Thanks a bunch for your help,
>>>>
>>>> Aditi
>>>>
>>>>
>>>> ----------------------
>>>> A Singh
>>>> Aditi.Singh at bristol.ac.uk
>>>> School of Biological Sciences
>>>> University of Bristol
>>>>
>>>>
>>>>
>>>> ----------------------------------------------------------------------
>>>> --
>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>> --
>>> Jérôme Goudet
>>> Dept. Ecology & Evolution
>>> UNIL-Sorge, CH-1015 Lausanne
>>>
>>> mail:jerome.goudet at unil.ch
>>> Tel:+41 (0)21 692 4242
>>> Fax:+41 (0)21 692 4265
>>>
>>
>>
>>
>> ----------------------
>> A Singh
>> Aditi.Singh at bristol.ac.uk
>> School of Biological Sciences
>> University of Bristol
>>
>>
>>
>>
>>
>
> --
> Jérôme Goudet
> Dept. Ecology & Evolution
> UNIL-Sorge, CH-1015 Lausanne
>
> mail:jerome.goudet at unil.ch
> Tel:+41 (0)21 692 4242
> Fax:+41 (0)21 692 4265
>



----------------------
A Singh
Aditi.Singh at bristol.ac.uk
School of Biological Sciences
University of Bristol




More information about the R-sig-mixed-models mailing list