[R-sig-ME] Specifying models nested crossed random effects
Joshua Rosenberg
jmichaelrosenberg at gmail.com
Tue Apr 25 16:33:08 CEST 2017
Diana - thank you very much, I think you're right, and I've found
(interestingly, though maybe not surprisingly) there are two identical ways
to specify this nesting, one is by explicitly identifying the participants
and signals (i.e., not "Participant 1" nested in "Program A" when there is
*another* "Participant 1" in "Program B," but rather using a unique
participant ID and sample ID for every participant and sample, i.e.
(1|program_ID) + (1|sample_ID) or by nesting not unique participant IDs and
sample IDs within program, i.e. (1|program_ID/participant_ID). Chapter 2
<http://lme4.r-forge.r-project.org/book/Ch2.pdf> of the (not published)
lme4 book by Bates helped me understand why (basically, lme4 figures out
the nesting (or crossing) on its own as long as the random effects are not
nested implicitly).
thanks again. also thinking hard about the ordinal issue - while some are,
some of our outcomes are composites of multiple items and so aren't ordinal.
Josh
On Sun, Apr 9, 2017 at 5:00 PM, Diana Michl <dmichl at uni-potsdam.de> wrote:
> Not planning to confuse anyone and I agree with Evan mostly. But it seems
> to me that even with the fixed effects, it still makes sense to include
> participants and programs as nested random effects because they really are
> nested (one factor (grouping variable) appears only within a particular
> level of another factor (grouping variable)).
> sample_ID seems fine, but I think it should still be (1|
> program_ID/participant_ID).
>
> Diana
>
> Am 09.04.2017 um 20:56 schrieb Evan Palmer-Young:
>
> Thanks for those details, Josh. Interesting design!
>
> I'm not experienced in interpreting random effects on their own, so others
> will have better advice on that.
>
> For your model structure, it sounds like there are three random effects:
>
> "program_ID"
> "participant_ID"
> "sample_ID"
>
> From my reading of lme4 documentation, I think that you have coded
> sample_ID correctly and do not need to explicitly nest it within program_ID.
>
> In general, think it may be better form to include both fixed and random
> predictors in your model, rather than having separate models to assess only
> the random effects.
>
> So your model might be something like,
>
> interest_model <- lmer(interest ~ ?Instruction_type? + ?time_of_day? +
> ?Working_alone? + (1}program_ID) + (1|participant_ID) + (1|sample_ID),
> data = df)
>
> Where Instruction_type, time_of_day , Working_alone, are fabricated
> variables that might resemble variables you recorded.
>
> As a disclaimer, this is my second time answering to the list-- welcome!
>
> Best wishes, Evan
>
>
>
>
>
> On Sat, Apr 8, 2017 at 4:26 PM, Joshua Rosenberg <jmichaelrosenberg at gmail.com> wrote:
>
>
> Thank you Evan for your response and thank you for clarifying.
>
> Responses are in-line below.
>
>
> Thank you for considering this!
>
> Josh
>
>
> On Sat, Apr 8, 2017 at 3:28 PM, Evan Palmer-Young <ecp52 at cornell.edu> <ecp52 at cornell.edu>
> wrote:
>
>
> Josh,
> Thanks for the questions.
> Can you provide a little bit more description about the variables?
>
>
> First, sorry, I had changed some of the variable names in the data and
> realize I used different names (and a different outcome) in the examples at
> the bottom.
>
> "interest" (one outcome we're measuring) is a variable of participants'
> self-reported interest using a 1-4 scale.
>
> "overall_engagement" is one other (different) outcome: One that was a
> composite of variables of students' interest, how hard they were
> concentrating,
> and how challenging they reported what they were learning was.
>
> We asked participants (youth) about how interested they were in what they
> were learning at random intervals using what is called an experience
> sampling method. In our method, youth had phones on which they were asked
> about what they were thinking / feeling - every youth in the same program
> (more on the programs in just a moment) was notified to answer our
> questions at the same time, although both the instance in time and the
> interval between these questions was different between programs.
>
> "site" = "program" (ID) and program is an indicator for membership in one
> of the 10 programs.
>
> Because youth were repeatedly sampled, "participant_ID" is an indicator
> for one of about 200 participants.
>
> "sample_ID" is an indicator unique for each program (it was made from the
> program_ID, the date, and which of one of four samples it was for that
> date). There are about 20 unique values for it for each program, from
> around 200 values total.
>
>
>
> Does "site" = "program"?
> Are participants queried at multiple timepoints? If pre- and
> post-program, could this be included as a factor with levels "before" and
> "afte
>
>
> Yes, the sampling consisted of repeated measures within participant
> (around 15-20 responses per participant). It's a bit tricky for me to
> describe, but as I mentioned above every youth in the same program was
> notified to answer questions at the same time, though both the instance in
> time and the interval between these questions differed between the 10
> programs.
>
>
>
> Do you have any particular hypotheses or questions you want to answer
> with your model?
>
>
> We're interested in, for a lack of a better word, time point or
> situation-specific ("sample_ID") variables' relationships with engagement.
> We coded video of the programs, including before and when youth were
> notified to respond, for example, the type of activity youth were
> participating in (i.e., working in groups or individually; doing hands-on
> activities or listening to the activity leaders). We imagine considering
> these as categorical variables.
>
> Similarly, we're interested in relationships between youth's
> characteristics (such as pre-program interest and demographic
> characteristics, such as gender) and our outcomes and to a bit of a lesser
> extent relationships between some program factors and outcomes (though with
> only 10 programs, we do not imagine we will have statistical power to
> detect any / many effects at that level).
>
> We're interested in sources of variance as a substantive question (how
> much of students' engagement is explained by time-point ("sample_ID"),
> youth ("participant_ID"), and program ("program_ID") effects?). Though this
> is a bit secondary to our questions about the specific variables at
> time-point, youth, and program levels.
>
>
>
> Best wishes, Evan
>
>
>
>
>
> --
> Joshua Rosenbergjmichaelrosenberg at gmail.comhttp://joshuamrosenberg.com
>
>
>
> --
> Diana Michl, M.A.
> PhD candidate
> International Experimental
> and Clinical Linguistics
> Universität Potsdamwww.ling.uni-potsdam.de/staff/dmichlwww.duoinfernale.eu
>
>
--
Joshua Rosenberg
jmichaelrosenberg at gmail.com
http://joshuamrosenberg.com
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list