[R-sig-ME] Specifying models nested crossed random effects
Ben Bolker
bbolker at gmail.com
Tue Apr 25 16:40:47 CEST 2017
http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#nested-or-crossed
tries to discuss this (last bullet point), but suggested edits for
clarity are welcome (or pull requests!)
On Tue, Apr 25, 2017 at 10:33 AM, Joshua Rosenberg
<jmichaelrosenberg at gmail.com> wrote:
> Diana - thank you very much, I think you're right, and I've found
> (interestingly, though maybe not surprisingly) there are two identical ways
> to specify this nesting, one is by explicitly identifying the participants
> and signals (i.e., not "Participant 1" nested in "Program A" when there is
> *another* "Participant 1" in "Program B," but rather using a unique
> participant ID and sample ID for every participant and sample, i.e.
> (1|program_ID) + (1|sample_ID) or by nesting not unique participant IDs and
> sample IDs within program, i.e. (1|program_ID/participant_ID). Chapter 2
> <http://lme4.r-forge.r-project.org/book/Ch2.pdf> of the (not published)
> lme4 book by Bates helped me understand why (basically, lme4 figures out
> the nesting (or crossing) on its own as long as the random effects are not
> nested implicitly).
>
> thanks again. also thinking hard about the ordinal issue - while some are,
> some of our outcomes are composites of multiple items and so aren't ordinal.
>
> Josh
>
> On Sun, Apr 9, 2017 at 5:00 PM, Diana Michl <dmichl at uni-potsdam.de> wrote:
>
>> Not planning to confuse anyone and I agree with Evan mostly. But it seems
>> to me that even with the fixed effects, it still makes sense to include
>> participants and programs as nested random effects because they really are
>> nested (one factor (grouping variable) appears only within a particular
>> level of another factor (grouping variable)).
>> sample_ID seems fine, but I think it should still be (1|
>> program_ID/participant_ID).
>>
>> Diana
>>
>> Am 09.04.2017 um 20:56 schrieb Evan Palmer-Young:
>>
>> Thanks for those details, Josh. Interesting design!
>>
>> I'm not experienced in interpreting random effects on their own, so others
>> will have better advice on that.
>>
>> For your model structure, it sounds like there are three random effects:
>>
>> "program_ID"
>> "participant_ID"
>> "sample_ID"
>>
>> From my reading of lme4 documentation, I think that you have coded
>> sample_ID correctly and do not need to explicitly nest it within program_ID.
>>
>> In general, think it may be better form to include both fixed and random
>> predictors in your model, rather than having separate models to assess only
>> the random effects.
>>
>> So your model might be something like,
>>
>> interest_model <- lmer(interest ~ ?Instruction_type? + ?time_of_day? +
>> ?Working_alone? + (1}program_ID) + (1|participant_ID) + (1|sample_ID),
>> data = df)
>>
>> Where Instruction_type, time_of_day , Working_alone, are fabricated
>> variables that might resemble variables you recorded.
>>
>> As a disclaimer, this is my second time answering to the list-- welcome!
>>
>> Best wishes, Evan
>>
>>
>>
>>
>>
>> On Sat, Apr 8, 2017 at 4:26 PM, Joshua Rosenberg <jmichaelrosenberg at gmail.com> wrote:
>>
>>
>> Thank you Evan for your response and thank you for clarifying.
>>
>> Responses are in-line below.
>>
>>
>> Thank you for considering this!
>>
>> Josh
>>
>>
>> On Sat, Apr 8, 2017 at 3:28 PM, Evan Palmer-Young <ecp52 at cornell.edu> <ecp52 at cornell.edu>
>> wrote:
>>
>>
>> Josh,
>> Thanks for the questions.
>> Can you provide a little bit more description about the variables?
>>
>>
>> First, sorry, I had changed some of the variable names in the data and
>> realize I used different names (and a different outcome) in the examples at
>> the bottom.
>>
>> "interest" (one outcome we're measuring) is a variable of participants'
>> self-reported interest using a 1-4 scale.
>>
>> "overall_engagement" is one other (different) outcome: One that was a
>> composite of variables of students' interest, how hard they were
>> concentrating,
>> and how challenging they reported what they were learning was.
>>
>> We asked participants (youth) about how interested they were in what they
>> were learning at random intervals using what is called an experience
>> sampling method. In our method, youth had phones on which they were asked
>> about what they were thinking / feeling - every youth in the same program
>> (more on the programs in just a moment) was notified to answer our
>> questions at the same time, although both the instance in time and the
>> interval between these questions was different between programs.
>>
>> "site" = "program" (ID) and program is an indicator for membership in one
>> of the 10 programs.
>>
>> Because youth were repeatedly sampled, "participant_ID" is an indicator
>> for one of about 200 participants.
>>
>> "sample_ID" is an indicator unique for each program (it was made from the
>> program_ID, the date, and which of one of four samples it was for that
>> date). There are about 20 unique values for it for each program, from
>> around 200 values total.
>>
>>
>>
>> Does "site" = "program"?
>> Are participants queried at multiple timepoints? If pre- and
>> post-program, could this be included as a factor with levels "before" and
>> "afte
>>
>>
>> Yes, the sampling consisted of repeated measures within participant
>> (around 15-20 responses per participant). It's a bit tricky for me to
>> describe, but as I mentioned above every youth in the same program was
>> notified to answer questions at the same time, though both the instance in
>> time and the interval between these questions differed between the 10
>> programs.
>>
>>
>>
>> Do you have any particular hypotheses or questions you want to answer
>> with your model?
>>
>>
>> We're interested in, for a lack of a better word, time point or
>> situation-specific ("sample_ID") variables' relationships with engagement.
>> We coded video of the programs, including before and when youth were
>> notified to respond, for example, the type of activity youth were
>> participating in (i.e., working in groups or individually; doing hands-on
>> activities or listening to the activity leaders). We imagine considering
>> these as categorical variables.
>>
>> Similarly, we're interested in relationships between youth's
>> characteristics (such as pre-program interest and demographic
>> characteristics, such as gender) and our outcomes and to a bit of a lesser
>> extent relationships between some program factors and outcomes (though with
>> only 10 programs, we do not imagine we will have statistical power to
>> detect any / many effects at that level).
>>
>> We're interested in sources of variance as a substantive question (how
>> much of students' engagement is explained by time-point ("sample_ID"),
>> youth ("participant_ID"), and program ("program_ID") effects?). Though this
>> is a bit secondary to our questions about the specific variables at
>> time-point, youth, and program levels.
>>
>>
>>
>> Best wishes, Evan
>>
>>
>>
>>
>>
>> --
>> Joshua Rosenbergjmichaelrosenberg at gmail.comhttp://joshuamrosenberg.com
>>
>>
>>
>> --
>> Diana Michl, M.A.
>> PhD candidate
>> International Experimental
>> and Clinical Linguistics
>> Universität Potsdamwww.ling.uni-potsdam.de/staff/dmichlwww.duoinfernale.eu
>>
>>
>
>
> --
> Joshua Rosenberg
> jmichaelrosenberg at gmail.com
> http://joshuamrosenberg.com
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list