[R-sig-ME] Identify large residuals
Ben Bolker
bbolker at gmail.com
Fri Jan 27 22:49:48 CET 2017
1. you can calculate residuals with different levels of random effects
included via predict(...,re.form=<something>)-(observed value). In
your case, though, it seems you just want the raw residuals()
(lowest-level) -- but see point #2.
2. in this sample data set, there is a single response per question for
all but one examinee. This will make the qid-with-examinee random
effect variance almost impossible to estimate (strongly confounded with
the observation-level residual variance); was that on purpose or is that
an artifact of the example you gave us to look at? (Now that I look
closer, I think this is what you meant by "I added one line at the
bottom with dummy data to get it to run"; otherwise you would get an
error from lmer() that you'd have to override.) What do your real data
look like? If they really have only one observation per examinee:qid
combo, then you should leave out the nested random effect -- it will be
captured entirely by the residual variance term.
3. For what it's worth, it doesn't seem as though log-transforming these
data is worthwhile, but that may be because you made up data that were
already reasonably well distributed?
On 17-01-27 04:27 PM, Stuart Luppescu wrote:
> Hello, I have a dataset of test item response times. The examinees took
> the test unsupervised online. We want to identify person-items with
> unusually large time residuals indicating that the examinee might have
> looked up the answer on Google before responding.
>
> I am trying to do this in a model with items nested within examinees
> like this:
>
> lmer.test1a <- lmer(log(answer_time) ~ qid + (1|examinee/qid), data=test.DF, REML=FALSE)
>
> The results look like this:
>
> Linear mixed model fit by maximum likelihood ['lmerMod']
> Formula: log(answer_time) ~ qid + (1 | examinee/qid)
> Data: test.DF
>
> AIC BIC logLik deviance df.resid
> 1670.2 1709.8 -826.1 1652.2 597
>
> Scaled residuals:
> Min 1Q Median 3Q Max
> -3.9656 -0.3263 0.1407 0.5539 2.7752
>
> Random effects:
> Groups Name Variance Std.Dev.
> qid:examinee (Intercept) 1.275e-15 3.571e-08
> examinee (Intercept) 1.920e-01 4.382e-01
> Residual 7.684e-01 8.766e-01
> Number of obs: 606, groups: qid:examinee, 600; examinee, 100
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 3.48381 0.09764 35.68
> qidItem2 -0.11060 0.12335 -0.90
> qidItem3 -0.09798 0.12335 -0.79
> qidItem4 -0.02294 0.12335 -0.19
> qidItem5 0.13196 0.12335 1.07
> qidItem6 -0.01915 0.12335 -0.16
>
> Does this look like a reasonable approach? If so, how would I get the
> residuals out of this to identify examinee/qid combinations that seem
> unusually large?
>
> Thanks in advance for any help.
>
> The dataset I'm using is pasted below. (I made the qid variable effects
> coded by doing contrast(test.DF$qid) <- contr.sum but I'm not sure if
> the contrast attribute is included in the dput below. Also, I added one
> line at the bottom with dummy data to get it to run.)
>
> dput(test.DF)
> structure(list(examinee = structure(c(3L, 3L, 3L, 3L, 3L, 3L,
> 6L, 6L, 6L, 6L, 6L, 6L, 9L, 9L, 9L, 9L, 9L, 9L, 7L, 7L, 7L, 7L,
> 7L, 7L, 96L, 96L, 96L, 96L, 96L, 96L, 8L, 8L, 8L, 8L, 8L, 8L,
> 4L, 4L, 4L, 4L, 4L, 4L, 12L, 12L, 12L, 12L, 12L, 12L, 16L, 16L,
> 16L, 16L, 16L, 16L, 10L, 10L, 10L, 10L, 10L, 10L, 19L, 19L, 19L,
> 19L, 19L, 19L, 5L, 5L, 5L, 5L, 5L, 5L, 21L, 21L, 21L, 21L, 21L,
> 21L, 18L, 18L, 18L, 18L, 18L, 18L, 99L, 99L, 99L, 99L, 99L, 99L,
> 98L, 98L, 98L, 98L, 98L, 98L, 13L, 13L, 13L, 13L, 13L, 13L, 26L,
> 26L, 26L, 26L, 26L, 26L, 1L, 1L, 1L, 1L, 1L, 1L, 29L, 29L, 29L,
> 29L, 29L, 29L, 30L, 30L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L,
> 31L, 31L, 32L, 32L, 32L, 32L, 32L, 32L, 23L, 23L, 23L, 23L, 23L,
> 23L, 35L, 35L, 35L, 35L, 35L, 35L, 36L, 36L, 36L, 36L, 36L, 36L,
> 37L, 37L, 37L, 37L, 37L, 37L, 38L, 38L, 38L, 38L, 38L, 38L, 100L,
> 100L, 100L, 100L, 100L, 100L, 40L, 40L, 40L, 40L, 40L, 40L, 42L,
> 42L, 42L, 42L, 42L, 42L, 34L, 34L, 34L, 34L, 34L, 34L, 46L, 46L,
> 46L, 46L, 46L, 46L, 47L, 47L, 47L, 47L, 47L, 47L, 44L, 44L, 44L,
> 44L, 44L, 44L, 15L, 15L, 15L, 15L, 15L, 15L, 52L, 52L, 52L, 52L,
> 52L, 52L, 55L, 55L, 55L, 55L, 55L, 55L, 53L, 53L, 53L, 53L, 53L,
> 53L, 39L, 39L, 39L, 39L, 39L, 39L, 51L, 51L, 51L, 51L, 51L, 51L,
> 48L, 48L, 48L, 48L, 48L, 48L, 58L, 58L, 58L, 58L, 58L, 58L, 22L,
> 22L, 22L, 22L, 22L, 22L, 33L, 33L, 33L, 33L, 33L, 33L, 60L, 60L,
> 60L, 60L, 60L, 60L, 95L, 95L, 95L, 95L, 95L, 95L, 59L, 59L, 59L,
> 59L, 59L, 59L, 56L, 56L, 56L, 56L, 56L, 56L, 63L, 63L, 63L, 63L,
> 63L, 63L, 57L, 57L, 57L, 57L, 57L, 57L, 50L, 50L, 50L, 50L, 50L,
> 50L, 62L, 62L, 62L, 62L, 62L, 62L, 25L, 25L, 25L, 25L, 25L, 25L,
> 64L, 64L, 64L, 64L, 64L, 64L, 14L, 14L, 14L, 14L, 14L, 14L, 66L,
> 66L, 66L, 66L, 66L, 66L, 61L, 61L, 61L, 61L, 61L, 61L, 68L, 68L,
> 68L, 68L, 68L, 68L, 49L, 49L, 49L, 49L, 49L, 49L, 69L, 69L, 69L,
> 69L, 69L, 69L, 41L, 41L, 41L, 41L, 41L, 41L, 54L, 54L, 54L, 54L,
> 54L, 54L, 67L, 67L, 67L, 67L, 67L, 67L, 65L, 65L, 65L, 65L, 65L,
> 65L, 70L, 70L, 70L, 70L, 70L, 70L, 43L, 43L, 43L, 43L, 43L, 43L,
> 20L, 20L, 20L, 20L, 20L, 20L, 72L, 72L, 72L, 72L, 72L, 72L, 11L,
> 11L, 11L, 11L, 11L, 11L, 97L, 97L, 97L, 97L, 97L, 97L, 74L, 74L,
> 74L, 74L, 74L, 74L, 75L, 75L, 75L, 75L, 75L, 75L, 77L, 77L, 77L,
> 77L, 77L, 77L, 76L, 76L, 76L, 76L, 76L, 76L, 79L, 79L, 79L, 79L,
> 79L, 79L, 78L, 78L, 78L, 78L, 78L, 78L, 80L, 80L, 80L, 80L, 80L,
> 80L, 81L, 81L, 81L, 81L, 81L, 81L, 71L, 71L, 71L, 71L, 71L, 71L,
> 82L, 82L, 82L, 82L, 82L, 82L, 83L, 83L, 83L, 83L, 83L, 83L, 28L,
> 28L, 28L, 28L, 28L, 28L, 84L, 84L, 84L, 84L, 84L, 84L, 89L, 89L,
> 89L, 89L, 89L, 89L, 87L, 87L, 87L, 87L, 87L, 87L, 86L, 86L, 86L,
> 86L, 86L, 86L, 90L, 90L, 90L, 90L, 90L, 90L, 91L, 91L, 91L, 91L,
> 91L, 91L, 85L, 85L, 85L, 85L, 85L, 85L, 2L, 2L, 2L, 2L, 2L, 2L,
> 92L, 92L, 92L, 92L, 92L, 92L, 17L, 17L, 17L, 17L, 17L, 17L, 24L,
> 24L, 24L, 24L, 24L, 24L, 93L, 93L, 93L, 93L, 93L, 93L, 88L, 88L,
> 88L, 88L, 88L, 88L, 73L, 73L, 73L, 73L, 73L, 73L, 27L, 27L, 27L,
> 27L, 27L, 27L, 45L, 45L, 45L, 45L, 45L, 45L, 94L, 94L, 94L, 94L,
> 94L, 94L, 100L, 100L, 100L, 100L, 100L, 100L), .Label = c("1",
> "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
> "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
> "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
> "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46",
> "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57",
> "58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68",
> "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79",
> "80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90",
> "91", "92", "93", "94", "95", "96", "97", "98", "99", "100"), class =
> "factor"),
> qid = structure(c(3L, 5L, 1L, 2L, 4L, 6L, 5L, 2L, 4L, 6L,
> 1L, 3L, 6L, 3L, 4L, 2L, 5L, 1L, 6L, 3L, 2L, 5L, 1L, 4L, 4L,
> 3L, 1L, 6L, 5L, 2L, 3L, 2L, 5L, 6L, 4L, 1L, 2L, 1L, 3L, 5L,
> 4L, 6L, 3L, 2L, 6L, 4L, 5L, 1L, 3L, 2L, 5L, 4L, 6L, 1L, 2L,
> 6L, 5L, 1L, 4L, 3L, 3L, 6L, 5L, 4L, 2L, 1L, 4L, 2L, 3L, 5L,
> 6L, 1L, 3L, 5L, 2L, 6L, 4L, 1L, 5L, 6L, 3L, 2L, 1L, 4L, 4L,
> 2L, 3L, 5L, 1L, 6L, 3L, 6L, 2L, 1L, 5L, 4L, 1L, 6L, 5L, 3L,
> 2L, 4L, 6L, 1L, 2L, 3L, 5L, 4L, 2L, 6L, 4L, 3L, 5L, 1L, 1L,
> 4L, 3L, 5L, 6L, 2L, 4L, 2L, 6L, 5L, 3L, 1L, 5L, 2L, 6L, 3L,
> 1L, 4L, 1L, 6L, 4L, 5L, 2L, 3L, 1L, 4L, 3L, 2L, 5L, 6L, 6L,
> 2L, 1L, 3L, 4L, 5L, 1L, 6L, 3L, 4L, 5L, 2L, 6L, 3L, 5L, 1L,
> 4L, 2L, 2L, 4L, 6L, 5L, 3L, 1L, 6L, 1L, 5L, 3L, 4L, 2L, 2L,
> 1L, 3L, 4L, 6L, 5L, 3L, 6L, 1L, 5L, 4L, 2L, 2L, 4L, 5L, 6L,
> 3L, 1L, 2L, 3L, 4L, 1L, 5L, 6L, 6L, 5L, 4L, 1L, 2L, 3L, 5L,
> 4L, 1L, 3L, 2L, 6L, 6L, 1L, 4L, 5L, 3L, 2L, 2L, 6L, 5L, 3L,
> 4L, 1L, 6L, 5L, 3L, 4L, 2L, 1L, 5L, 1L, 3L, 4L, 2L, 6L, 4L,
> 3L, 2L, 6L, 1L, 5L, 2L, 3L, 5L, 1L, 4L, 6L, 6L, 5L, 2L, 3L,
> 4L, 1L, 5L, 4L, 3L, 2L, 1L, 6L, 4L, 3L, 2L, 6L, 1L, 5L, 2L,
> 3L, 5L, 6L, 4L, 1L, 4L, 3L, 1L, 5L, 6L, 2L, 3L, 6L, 2L, 4L,
> 5L, 1L, 6L, 1L, 3L, 4L, 2L, 5L, 5L, 1L, 3L, 2L, 4L, 6L, 1L,
> 6L, 3L, 4L, 2L, 5L, 1L, 6L, 3L, 5L, 2L, 4L, 5L, 4L, 6L, 2L,
> 1L, 3L, 4L, 3L, 5L, 2L, 6L, 1L, 5L, 6L, 4L, 3L, 1L, 2L, 5L,
> 4L, 3L, 2L, 1L, 6L, 3L, 6L, 2L, 5L, 4L, 1L, 3L, 6L, 4L, 5L,
> 2L, 1L, 6L, 1L, 2L, 3L, 5L, 4L, 2L, 1L, 4L, 3L, 5L, 6L, 5L,
> 2L, 4L, 3L, 6L, 1L, 4L, 5L, 3L, 1L, 2L, 6L, 2L, 1L, 4L, 6L,
> 3L, 5L, 6L, 2L, 5L, 4L, 1L, 3L, 4L, 6L, 2L, 1L, 5L, 3L, 4L,
> 6L, 3L, 5L, 1L, 2L, 6L, 3L, 1L, 5L, 2L, 4L, 5L, 6L, 1L, 4L,
> 2L, 3L, 6L, 4L, 5L, 2L, 3L, 1L, 4L, 5L, 1L, 6L, 3L, 2L, 6L,
> 1L, 4L, 2L, 5L, 3L, 3L, 1L, 5L, 4L, 2L, 6L, 4L, 6L, 1L, 2L,
> 3L, 5L, 1L, 4L, 5L, 3L, 2L, 6L, 5L, 2L, 1L, 6L, 3L, 4L, 6L,
> 1L, 2L, 3L, 4L, 5L, 3L, 1L, 5L, 2L, 6L, 4L, 3L, 2L, 1L, 5L,
> 6L, 4L, 5L, 4L, 6L, 1L, 2L, 3L, 4L, 5L, 3L, 2L, 1L, 6L, 4L,
> 3L, 6L, 2L, 1L, 5L, 5L, 4L, 3L, 2L, 6L, 1L, 2L, 6L, 4L, 1L,
> 5L, 3L, 3L, 4L, 6L, 2L, 5L, 1L, 5L, 3L, 4L, 2L, 1L, 6L, 2L,
> 4L, 5L, 6L, 1L, 3L, 6L, 3L, 4L, 5L, 1L, 2L, 1L, 5L, 2L, 4L,
> 6L, 3L, 6L, 5L, 1L, 3L, 4L, 2L, 6L, 3L, 4L, 1L, 2L, 5L, 5L,
> 6L, 4L, 2L, 3L, 1L, 1L, 3L, 4L, 2L, 5L, 6L, 5L, 3L, 2L, 6L,
> 4L, 1L, 1L, 5L, 4L, 2L, 3L, 6L, 2L, 5L, 1L, 4L, 6L, 3L, 4L,
> 1L, 2L, 5L, 6L, 3L, 6L, 5L, 4L, 3L, 2L, 1L, 1L, 4L, 3L, 5L,
> 6L, 2L, 5L, 3L, 1L, 4L, 2L, 6L, 2L, 6L, 4L, 3L, 5L, 1L, 2L,
> 6L, 4L, 5L, 1L, 3L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("Item1",
> "Item2", "Item3", "Item4", "Item5", "Item6"), class = "factor"),
> answer_time = c(16, 11, 29, 19, 51, 23, 17, 28, 36, 57, 23,
> 20, 26, 29, 90, 13, 43, 41, 40, 90, 63, 56, 54, 54, 1, 27,
> 35, 90, 90, 32, 13, 12, 57, 24, 56, 18, 33, 61, 34, 36, 47,
> 38, 90, 67, 21, 74, 81, 71, 28, 40, 22, 22, 26, 69, 77, 69,
> 35, 76, 55, 24, 90, 42, 44, 16, 22, 39, 1, 32, 72, 90, 28,
> 54, 1, 56, 51, 40, 11, 29, 64, 32, 62, 50, 19, 19, 90, 26,
> 36, 16, 22, 14, 1, 49, 53, 88, 48, 54, 60, 28, 33, 58, 15,
> 22, 44, 47, 10, 71, 75, 60, 39, 28, 31, 17, 61, 42, 1, 56,
> 76, 39, 28, 26, 32, 90, 19, 90, 63, 41, 90, 57, 21, 45, 52,
> 36, 1, 55, 62, 60, 83, 58, 90, 90, 83, 30, 60, 77, 54, 18,
> 42, 66, 26, 69, 15, 41, 27, 12, 34, 18, 61, 56, 49, 56, 43,
> 34, 85, 90, 31, 73, 65, 83, 1, 90, 59, 22, 90, 90, 28, 46,
> 90, 17, 47, 42, 53, 25, 35, 47, 19, 31, 49, 72, 73, 34, 75,
> 63, 43, 30, 10, 14, 41, 32, 90, 90, 56, 68, 32, 10, 90, 69,
> 43, 11, 45, 49, 90, 61, 72, 57, 70, 77, 6, 1, 2, 2, 1, 2,
> 90, 64, 75, 18, 22, 24, 66, 23, 45, 67, 49, 55, 14, 20, 9,
> 11, 9, 17, 1, 25, 21, 34, 90, 32, 90, 71, 38, 34, 18, 36,
> 35, 37, 34, 22, 30, 21, 44, 34, 58, 15, 32, 23, 45, 90, 56,
> 43, 41, 42, 17, 40, 90, 90, 20, 40, 75, 25, 35, 42, 31, 48,
> 28, 51, 29, 31, 12, 90, 21, 43, 16, 63, 35, 23, 23, 25, 16,
> 23, 18, 14, 58, 19, 22, 54, 37, 52, 90, 71, 21, 72, 85, 76,
> 71, 13, 53, 14, 43, 68, 76, 28, 38, 33, 13, 13, 50, 27, 48,
> 21, 36, 28, 1, 32, 10, 68, 12, 21, 90, 22, 77, 34, 35, 39,
> 64, 55, 42, 82, 88, 90, 33, 18, 85, 49, 23, 33, 1, 55, 42,
> 19, 36, 90, 39, 32, 6, 29, 36, 25, 1, 24, 20, 24, 15, 28,
> 90, 24, 13, 35, 19, 13, 82, 56, 43, 30, 74, 74, 90, 77, 12,
> 34, 41, 77, 90, 90, 53, 64, 38, 90, 25, 40, 55, 69, 18, 16,
> 53, 49, 82, 28, 73, 46, 72, 76, 53, 66, 73, 53, 37, 28, 39,
> 90, 48, 21, 90, 75, 77, 65, 61, 18, 90, 26, 29, 22, 51, 76,
> 1, 31, 28, 74, 29, 21, 90, 62, 43, 42, 28, 58, 44, 36, 29,
> 50, 21, 90, 28, 19, 18, 21, 12, 19, 1, 48, 59, 62, 49, 1,
> 26, 32, 27, 18, 16, 15, 37, 48, 24, 27, 30, 42, 68, 38, 35,
> 90, 66, 73, 1, 19, 90, 56, 21, 17, 65, 35, 41, 64, 38, 25,
> 90, 25, 57, 41, 63, 71, 1, 41, 25, 17, 47, 48, 28, 69, 31,
> 31, 22, 59, 86, 25, 21, 52, 19, 32, 51, 43, 22, 33, 90, 31,
> 88, 63, 70, 71, 76, 74, 13, 27, 9, 21, 12, 15, 76, 17, 36,
> 19, 6, 51, 71, 77, 67, 32, 74, 14, 1, 90, 18, 26, 50, 41,
> 69, 58, 22, 62, 10, 40, 15, 8, 14, 7, 16, 5, 1, 54, 90, 25,
> 29, 41, 33, 40, 36, 30, 24, 63, 1, 44, 16, 13, 40, 20, 90,
> 21, 34, 10, 32, 14, 90, 33, 90, 11, 34, 76, 1, 77, 77, 82,
> 32, 90, 1, 1, 1, 1, 1, 1)), row.names = c(NA, 606L), .Names =
> c("examinee",
> "qid", "answer_time"), class = "data.frame")
>>
>
More information about the R-sig-mixed-models
mailing list