[R-sig-ME] Differing variable lengths (missing data) and Model errors in lmer()
Jerome Goudet
jerome.goudet at unil.ch
Fri Jul 10 15:31:24 CEST 2009
Dear Aditi,
Would'nt it be simpler then to remove the family effect (by using e.g.
the residuals of lmer(peg.no~1+(1|Family))) and to look at the effect of
individual loci on the residuals (I think this is what was suggested by
Aulchenko etal (Genetics 2007 177:577)? But I may be wrong and missing
something...
Best wishes,
A Singh wrote:
> Dear Jerome,
>
> Thank you very much for your suggestion, and apologies for the late
> response..
>
> I have tried exactly the same approach when I was still going through
> this in my preliminary stages, and I've tried to run lmer() using them
> as fixed effects; however the catch here, as my supervisors explained
> to me is this-
> - the alleles are not fixed effects because we don't just want to see
> if they differ significantly in their presence/absence across all 60
> families.
>
> We want to see if they show associations with phenotype-measurements
> within each family so that we can see if they account for the
> unexplained variation within each family, if that makes sense? And
> this makes them random effects.
>
> We are trying to (as a larger goal) look for marker-trait
> associations, map QTL's and identify fragments of genes responsible
> for speciation/genes under selection.
> Nesting the markers within each family is a sort of mini-within-family
> ANOVA with random effects to see whether particular families have a
> phenotypic trait more accounted for by a particular marker/combination
> of markers, than the trait may be accounted for in other families
> based on the same markers.
>
> Working on this principle, I need to run my lmer model they way I was
> trying to, using the for loops to nest markers within families so that
> for each phenotype, I would have (no of families*no of markers) runs..
>
> Hope this makes it clearer...
>
> Thanks again,
>
> Aditi
>
>
>
>
> --On 09 July 2009 21:21 +0200 Jerome Goudet <jerome.goudet at unil.ch>
> wrote:
>
>> Dear Aditi,
>>
>> I'm puzzled with the nesting of loci within families (the presence of an
>> allele at a locus means the same whatever the family, I would think). I
>> would treat loci as fixed effect and hence use something like that:
>>
>> (fm1<-lmer(peg.no~P1L55+P1L73+P1L74+P1L77+P1L91+P1L96+P1L98+P1L100+P1L11
>>
>> 4+P1L118+(1|family),dat))
>> Linear mixed model fit by REML
>> Formula: peg.no ~ P1L55 + P1L73 + P1L74 + P1L77 + P1L91 + P1L96 +
>> P1L98 +
>> P1L100 + P1L114 + P1L118 + (1 | family)
>> Data: dat
>> AIC BIC logLik deviance REMLdev
>> 2954 3006 -1464 2963 2928
>> Random effects:
>> Groups Name Variance Std.Dev.
>> family (Intercept) 89.043 9.4363 Residual 91.377
>> 9.5592 Number of obs: 390, groups: family, 57
>>
>> Fixed effects:
>> Estimate Std. Error t value
>> (Intercept) 96.2793 3.2169 29.929
>> P1L55 -2.0185 1.5831 -1.275
>> P1L73 -3.0256 1.8900 -1.601
>> P1L74 1.5112 1.6189 0.933
>> P1L77 -2.7898 2.3529 -1.186
>> P1L91 0.6480 3.3118 0.196
>> P1L96 -1.4718 1.4938 -0.985
>> P1L98 2.6431 2.6401 1.001
>> P1L100 -3.4529 2.0842 -1.657
>> P1L114 0.8099 1.9218 0.421
>> P1L118 -2.7119 2.3635 -1.147
>>
>>
>> But perhaps I missed something?
>>
>> Best wishes,
>>
>> A Singh wrote:
>>> Dear All,
>>>
>>> I am trying to run a nested random effects model in lmer (for R 2.9.1,
>>> lme4 version 0.999375-31 ) using data which is structured as follows:
>>>
>>> family offsp. P1L74 P1L77 P1L91 P1L96..(n=426) peg.no ec.length
>>> syll.length
>>> 1 2 1 0 0 0 86
>>> 5.445 2.479
>>> 1 3 1 0 0 0 91
>>> 5.215 2.356
>>> 1 4 0 0 0 0 79
>>> 5.682 2.896
>>> 1 5 1 0 0 0 83
>>> 5.149 2.641
>>> 1 6 0 0 0 0 77
>>> 5.044 2.288
>>> 1 7 1 0 0 0 78
>>> 5.450 2.420
>>> 1 8 1 0 1 1 82
>>> 5.377 2.505
>>> 1 9 1 0 0 0 95
>>> 5.389 2.706
>>> 1 10 1 0 0 0 88
>>> 5.354 2.461
>>> 1 11 1 0 0 0 87
>>> 5.262 3.079
>>> 1 12 1 0 0 0 84
>>> 5.191 2.858
>>> 1 13 1 0 0 1 87
>>> 5.194 2.264
>>> 2 23 1 0 0 1 116
>>> 5.863 2.876
>>> 2 24 1 0 0 0 122
>>> 5.475 3.114
>>> 2 25 1 0 0 0 110
>>> 5.563 3.059
>>> . . . . . . . . .
>>> . . . . . . . . .
>>> .
>>> .
>>> (60 families)
>>>
>>> 'Family' is the first Random effect (categorical variable), with 60
>>> levels.
>>>
>>> All columns labeled P1L(x) are a matrix of presence/absence genetic
>>> markers for each individual in each family. There are 426 such
>>> columns(not numbered in sequence) and each one is a random effect.
>>>
>>> The last three columns (peg.no, ec.length and syll.length) are the
>>> three dependent variables.
>>>
>>> Each genetic marker column needs to be nested within each family,
>>> which means that if I take the first phenotype, peg.no, for example,
>>> then I need to run an analysis that partitions variance 60*426 times
>>> (~25,00 runs) for that phenotype.
>>> I also need an output with the Anova table, and summary, at each stage
>>> of the run, so that I can get P values for each marker for each
>>> family, as a way of determining whether it contributes significantly
>>> to explaining within-family variance for that trait.
>>>
>>> To do the above, I tried to write code for lmer() using two nested
>>> 'for' loops, one for each level of random factor nesting (marker
>>> within family) as follows (using a test data set [please find
>>> attached] with only the first 10 marker columns, to see if this works):
>>>
>>>
>>>> vc<-read.table("P:\\R\\Testvcomp10.txt",header=T)
>>>> attach(vc)
>>>
>>>> family<-factor(family)
>>>> colms<-(vc)[,4:13] ## this to assign the 10 columns containing marker
>>> data to a new variable, as column names are themselves not in any
>>> recognisable sequence
>>>
>>>> vcdf<-data.frame(family,peg.no,ec.length,syll.length,colms)
>>>> library(lme4)
>>>
>>>> for (c in levels(family))
>>> + { for (i in 1:length(colms))
>>> + { fit<-lmer(peg.no~1 + (1|c/i), vcdf)
>>> + }
>>> + summ<-summary(fit)
>>> + av<-anova(fit)
>>> + print(summ)
>>> + print(av)
>>> + }
>>>
>>> This gives me:
>>>
>>> Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1
>>> + :
>>> variable lengths differ (found for 'c')
>>>
>>>
>>>
>>> On suggestion from a colleague I reframed this as:
>>>
>>>
>>>> for(c in levels(family))
>>> + {
>>> + print("----New C:----")
>>> + print(c)
>>> + for (i in 1:length(colms))
>>> + {
>>> + fit<-lmer(peg.no~1+ (1|c/i),vcdf)
>>> + print(i)
>>> + summ<-summary(fit)
>>> + av<-anova(fit)
>>> + print(summ)
>>> + print(av)
>>> + }
>>> + }
>>>
>>> ..and this gave me the output:
>>>
>>> [1] "----New C:----"
>>> [1] "1"
>>> Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1
>>> + :
>>> variable lengths differ (found for 'c')
>>>
>>>
>>> Google-ing the error message has led me to plenty of links that
>>> suggest forcing data into a data frame to fix this, but that hasn't
>>> worked.
>>> My markers and phenotypes both have plenty of missing data (NA's in
>>> the data.frame), and na.action=na.omit isn't solving the problem.
>>>
>>> (I have tried this with lme, and tried to do it with the aov() command
>>> as well and the error is pretty much the same).
>>>
>>> I am completely new to R, and despite searching and trying various
>>> things, can't get the code to work.
>>>
>>> I really appreciate any corrections to this code, or alternative
>>> command/function suggestions that I can look into, to try to do this
>>> again.
>>>
>>>
>>> Thanks a bunch for your help,
>>>
>>> Aditi
>>>
>>>
>>> ----------------------
>>> A Singh
>>> Aditi.Singh at bristol.ac.uk
>>> School of Biological Sciences
>>> University of Bristol
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> --
>> Jérôme Goudet
>> Dept. Ecology & Evolution
>> UNIL-Sorge, CH-1015 Lausanne
>>
>> mail:jerome.goudet at unil.ch
>> Tel:+41 (0)21 692 4242
>> Fax:+41 (0)21 692 4265
>>
>
>
>
> ----------------------
> A Singh
> Aditi.Singh at bristol.ac.uk
> School of Biological Sciences
> University of Bristol
>
>
>
>
>
--
Jérôme Goudet
Dept. Ecology & Evolution
UNIL-Sorge, CH-1015 Lausanne
mail:jerome.goudet at unil.ch
Tel:+41 (0)21 692 4242
Fax:+41 (0)21 692 4265
More information about the R-sig-mixed-models
mailing list