[R-sig-ME] Differing variable lengths (missing data) and Model errors in lmer()

Fri Jul 10 12:25:42 CEST 2009

Dear Jerome,

Thank you very much for your suggestion, and apologies for the late 
response..

I have tried exactly the same approach when I was still going through this 
in my preliminary stages, and I've tried to run lmer() using them as fixed 
effects; however the catch here, as my supervisors explained to me is this-
- the alleles are not fixed effects because we don't just want to see if 
they differ significantly in their presence/absence across all 60 families.

We want to see if they show associations with phenotype-measurements within 
each family so that we can see if they account for the unexplained 
variation within each family, if that makes sense? And this makes them 
random effects.

We are trying to (as a larger goal) look for marker-trait associations, map 
QTL's and  identify fragments of genes responsible for speciation/genes 
under selection.
Nesting the markers within each family is a sort of mini-within-family 
ANOVA with random effects to see whether particular families have a 
phenotypic trait more accounted for by a particular marker/combination of 
markers, than the trait may be accounted for in other families based on the 
same markers.

Working on this principle, I need to run my lmer model they way I was 
trying to, using the for loops to nest markers within families so that for 
each phenotype, I would have (no of families*no of markers) runs..

Hope this makes it clearer...

Thanks again,

Aditi

--On 09 July 2009 21:21 +0200 Jerome Goudet <jerome.goudet at unil.ch> wrote:

> Dear Aditi,
>
> I'm puzzled with the nesting of loci within families (the presence of an
> allele at a locus means the same whatever the family, I would think). I
> would treat loci as fixed effect and hence use something like that:
>
>  (fm1<-lmer(peg.no~P1L55+P1L73+P1L74+P1L77+P1L91+P1L96+P1L98+P1L100+P1L11
> 4+P1L118+(1|family),dat))
> Linear mixed model fit by REML
> Formula: peg.no ~ P1L55 + P1L73 + P1L74 + P1L77 + P1L91 + P1L96 + P1L98 +
> P1L100 + P1L114 + P1L118 + (1 | family)
>    Data: dat
>   AIC  BIC logLik deviance REMLdev
>  2954 3006  -1464     2963    2928
> Random effects:
>  Groups   Name        Variance Std.Dev.
>  family   (Intercept) 89.043   9.4363  Residual             91.377
> 9.5592 Number of obs: 390, groups: family, 57
>
> Fixed effects:
>             Estimate Std. Error t value
> (Intercept)  96.2793     3.2169  29.929
> P1L55        -2.0185     1.5831  -1.275
> P1L73        -3.0256     1.8900  -1.601
> P1L74         1.5112     1.6189   0.933
> P1L77        -2.7898     2.3529  -1.186
> P1L91         0.6480     3.3118   0.196
> P1L96        -1.4718     1.4938  -0.985
> P1L98         2.6431     2.6401   1.001
> P1L100       -3.4529     2.0842  -1.657
> P1L114        0.8099     1.9218   0.421
> P1L118       -2.7119     2.3635  -1.147
>
>
> But perhaps I missed something?
>
> Best wishes,
>
> A Singh wrote:
>> Dear All,
>>
>> I am trying to run a nested random effects model in lmer (for R 2.9.1,
>> lme4 version 0.999375-31 ) using data which is structured as follows:
>>
>> family offsp. P1L74 P1L77 P1L91 P1L96..(n=426) peg.no ec.length
>> syll.length
>> 1        2      1      0      0      0                86
>> 5.445         2.479
>> 1        3      1      0      0      0                91
>> 5.215         2.356
>> 1        4      0      0      0      0                  79
>> 5.682         2.896
>> 1        5      1      0      0      0               83
>> 5.149         2.641
>> 1        6      0      0      0      0                77
>> 5.044         2.288
>> 1        7      1      0      0      0                  78
>> 5.450         2.420
>> 1        8      1      0      1      1                82
>> 5.377         2.505
>> 1        9      1      0      0      0                95
>> 5.389         2.706
>> 1       10      1      0      0      0                88
>> 5.354         2.461
>> 1       11      1      0      0      0                87
>> 5.262         3.079
>> 1       12      1      0      0      0                84
>> 5.191         2.858
>> 1       13      1      0      0      1                  87
>> 5.194         2.264
>> 2       23      1      0      0      1                  116
>> 5.863         2.876
>> 2       24      1      0      0      0                122
>> 5.475         3.114
>> 2       25      1      0      0      0                110
>> 5.563         3.059
>> .       .       .   .     .     .               .         .          .
>> .       .       .   .     .     .               .         .          .
>> .
>> .
>> (60 families)
>>
>> 'Family' is the first Random effect (categorical variable), with 60
>> levels.
>>
>> All columns labeled P1L(x) are a matrix of presence/absence genetic
>> markers for each individual in each family. There are 426 such
>> columns(not numbered in sequence) and each one is a random effect.
>>
>> The last three columns (peg.no, ec.length and syll.length) are the
>> three dependent variables.
>>
>> Each genetic marker column needs to be nested within each family,
>> which means that if I take the first phenotype, peg.no, for example,
>> then I need to run an analysis that partitions variance 60*426 times
>> (~25,00 runs) for that phenotype.
>> I also need an output with the Anova table, and summary, at each stage
>> of the run, so that I can get P values for each marker for each
>> family, as a way of determining whether it contributes significantly
>> to explaining within-family variance for that trait.
>>
>> To do the above, I tried to write code for lmer() using two nested
>> 'for' loops, one for each level of random factor nesting (marker
>> within family) as follows (using a test data set [please find
>> attached] with only the first 10 marker columns, to see if this works):
>>
>>
>>> vc<-read.table("P:\\R\\Testvcomp10.txt",header=T)
>>> attach(vc)
>>
>>> family<-factor(family)
>>> colms<-(vc)[,4:13] ## this to assign the 10 columns containing marker
>> data    to a new variable, as column names are themselves not in any
>> recognisable sequence
>>
>>> vcdf<-data.frame(family,peg.no,ec.length,syll.length,colms)
>>> library(lme4)
>>
>>> for (c in levels(family))
>> + {    for (i in 1:length(colms))
>> +        { fit<-lmer(peg.no~1 + (1|c/i), vcdf)
>> +        }
>> +    summ<-summary(fit)
>> +    av<-anova(fit)
>> +    print(summ)
>> +    print(av)
>> + }
>>
>> This gives me:
>>
>> Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1 +  :
>>  variable lengths differ (found for 'c')
>>
>>
>>
>> On suggestion from a colleague I reframed this as:
>>
>>
>>> for(c in levels(family))
>> + {
>> + print("----New C:----")
>> + print(c)
>> + for (i in 1:length(colms))
>> + {
>> + fit<-lmer(peg.no~1+ (1|c/i),vcdf)
>> + print(i)
>> + summ<-summary(fit)
>> + av<-anova(fit)
>> + print(summ)
>> + print(av)
>> + }
>> + }
>>
>> ..and this gave me the output:
>>
>> [1] "----New C:----"
>> [1] "1"
>> Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1 +  :
>>  variable lengths differ (found for 'c')
>>
>>
>> Google-ing the error message has led me to plenty of links that
>> suggest forcing data into a data frame to fix this, but that hasn't
>> worked.
>> My markers and phenotypes both have plenty of missing data (NA's in
>> the data.frame), and na.action=na.omit isn't solving the problem.
>>
>> (I have tried this with lme, and tried to do it with the aov() command
>> as well and the error is pretty much the same).
>>
>> I am completely new to R, and despite searching and trying various
>> things, can't get the code to work.
>>
>> I really appreciate any corrections to this code, or alternative
>> command/function suggestions that I can look into, to try to do this
>> again.
>>
>>
>> Thanks a bunch for your help,
>>
>> Aditi
>>
>>
>> ----------------------
>> A Singh
>> Aditi.Singh at bristol.ac.uk
>> School of Biological Sciences
>> University of Bristol
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> --
> Jérôme Goudet
> Dept. Ecology & Evolution
> UNIL-Sorge, CH-1015 Lausanne
>
> mail:jerome.goudet at unil.ch
> Tel:+41 (0)21 692 4242
> Fax:+41 (0)21 692 4265
>

----------------------
A Singh
Aditi.Singh at bristol.ac.uk
School of Biological Sciences
University of Bristol