[R-sig-ME] LMER-CorpusData

Taha Omidian t@h@@omidi@n @ending from vuw@@c@nz
Wed Oct 10 12:52:52 CEST 2018

Hi Philip,

Thanks so much for your reply.

I think the best way to describe the data is to start with the aim of our study. The purpose of our study is to investigate the effect of discipline, genre, and level of study on the use certain word combinations in learner writing. To represent learner writing, we compiled a corpus of texts collected from students in 30 different disciplines and at four levels of study. Texts in the corpus were then categorised based on their genres (13 genres).

Following this, we classified the disciplines into four major disciplinary groupings. Genres were also grouped under 5 broad categories based on their social purposes. We then search the corpus for the occurrence of 278 word combinations (e.g., on the other hand) and recorded their normalised frequency of occurrence for each text (labeled as ref.norm in our data).

To me, our data is structured in a hierarchical fashion (for each predictor). So here is what we have in our data:

-Students (student_id col) contributed multiple texts (id col)

-Each text is nested within different disciplines (discipline col) which are clustered within four disciplinary groupings (disciplinaryGroup col)

-Each text is nested within genres (genreFamily col) which are grouped into five genre groups (genreGroup col)

-Each text is nested within four levels of study (level col)

Predictors (based on the labels in our data) are: disciplinaryGroup, genreGroup, level
Dependent variable (based on its label in our data) is: ref.norm

So I need to know how this nested structure can be reflected in a LME model.

As always thanks for your help.


On Oct 9, 2018, at 11:10 PM, Phillip Alday <phillip.alday using mpi.nl<mailto:phillip.alday using mpi.nl>> wrote:

I don't think this is the model you're looking for...

1. It's really weird to have your predictors in one dataframe and your
dependent variable in a different one. Are you really sure that the rows
line up like you think they do? If so, why not join the dataframes
earlier (with merge(), plyr::join() or dplyr::join())?

I'm overall quite nervous about namespaces / scope / etc. in your code
-- using attach() isn't recommended practice, especially when you mix
and match things (e.g. your levelX variables aren't in your dataframe,
but the other predictors are). You have to be really careful to make
sure you're using the data you think you're using.

You can do it like you have it, but it makes me very nervous in terms of
computing what you think you're computing.

2. Your levels include the same predictor in both the fixed effects and
as a grouping variable (the part of the random effect after the |) .
This generally doesn't make sense -- there are a number of posts on this
mailing list to that effect (see also
https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frpubs.com%2FINBOstats%2Fboth_fixed_random&data=02%7C01%7Ctaha.omidian%40vuw.ac.nz%7C4f9c8008c76b4354479908d62dcf77f8%7Ccfe63e236951427e8683bb84dcf1d20c%7C0%7C0%7C636746766469897297&sdata=nDnQofQVnta%2BUlvfdGI1z5PiNxkai0AXW59Uy368xUU%3D&reserved=0 and
https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.muscardinus.be%2F2017%2F08%2Ffixed-and-random%2F&data=02%7C01%7Ctaha.omidian%40vuw.ac.nz%7C4f9c8008c76b4354479908d62dcf77f8%7Ccfe63e236951427e8683bb84dcf1d20c%7C0%7C0%7C636746766469897297&sdata=7D%2FgIEUAJ%2BCOmR%2BrpNRtU49jyOtXDZk33cz5h9Ke04Y%3D&reserved=0) -- but it depends
on your data.

In other words, seeing your model specification isn't quite enough -- we
also need to know something about your data, more than your variable
names alone reveal. Even though I work a lot with language data, I still
can't tell enough from your variable names and code what your data
actually represent.


On 10/08/2018 12:46 AM, Taha Omidian wrote:

I’m trying to fit a mixed effects model to my corpus data. The data has a hierarchical structure. I need to make sure that the final model reflects this nested structure.

My final model looks like this:

theMdl<-lmer(dis.norm.j$transformed~disciplinaryGroup+genreGroup+level+(1|student_id)+(1|levelA)+(1|levelB)+(1|levelC),data=thedata, control=lmerControl("bobyqa”))


LevelA is genreGroup:genreFamily:student_id
levelB is disciplinaryGroup:discipline:student_id
levelC is level:student_id

Here is a link to my data and R script: https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.dropbox.com%2Fsh%2F46r6lv6n89bromk%2FAABMc8MQmAYhRC3ubJ0Ii7Wma%3Fdl%3D0&data=02%7C01%7Ctaha.omidian%40vuw.ac.nz%7C4f9c8008c76b4354479908d62dcf77f8%7Ccfe63e236951427e8683bb84dcf1d20c%7C0%7C0%7C636746766469897297&sdata=%2FnFwGE4shUmS2L1QGO0ExQ0jh49iyLMCj7xhx9%2BX2yI%3D&reserved=0


R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list

	[[alternative HTML version deleted]]

More information about the R-sig-mixed-models mailing list