[R-sig-ME] Identify large residuals

Fri Jan 27 22:49:48 CET 2017

1. you can calculate residuals with different levels of random effects
included via   predict(...,re.form=<something>)-(observed value).  In
your case, though, it seems you just want the raw residuals()
(lowest-level) -- but see point #2.

2. in this sample data set, there is a single response per question for
all but one examinee.  This will make the qid-with-examinee random
effect variance almost impossible to estimate (strongly confounded with
the observation-level residual variance); was that on purpose or is that
an artifact of the example you gave us to look at? (Now that I look
closer, I think this is what you meant by "I added one line at the
bottom with dummy data to get it to run"; otherwise you would get an
error from lmer() that you'd have to override.) What do your real data
look like? If they really have only one observation per examinee:qid
combo, then you should leave out the nested random effect -- it will be
captured entirely by the residual variance term.

3. For what it's worth, it doesn't seem as though log-transforming these
data is worthwhile, but that may be because you made up data that were
already reasonably well distributed?

On 17-01-27 04:27 PM, Stuart Luppescu wrote:
> Hello, I have a dataset of test item response times. The examinees took
> the test unsupervised online. We want to identify person-items with
> unusually large time residuals indicating that the examinee might have
> looked up the answer on Google before responding. 
> 
> I am trying to do this in a model with items nested within examinees
> like this:
> 
> lmer.test1a <- lmer(log(answer_time) ~ qid + (1|examinee/qid), data=test.DF, REML=FALSE)
> 
> The results look like this:
> 
> Linear mixed model fit by maximum likelihood  ['lmerMod']
> Formula: log(answer_time) ~ qid + (1 | examinee/qid)
>    Data: test.DF
> 
>      AIC      BIC   logLik deviance df.resid 
>   1670.2   1709.8   -826.1   1652.2      597 
> 
> Scaled residuals: 
>     Min      1Q  Median      3Q     Max 
> -3.9656 -0.3263  0.1407  0.5539  2.7752 
> 
> Random effects:
>  Groups       Name        Variance  Std.Dev. 
>  qid:examinee (Intercept) 1.275e-15 3.571e-08
>  examinee     (Intercept) 1.920e-01 4.382e-01
>  Residual                 7.684e-01 8.766e-01
> Number of obs: 606, groups:  qid:examinee, 600; examinee, 100
> 
> Fixed effects:
>             Estimate Std. Error t value
> (Intercept)  3.48381    0.09764   35.68
> qidItem2    -0.11060    0.12335   -0.90
> qidItem3    -0.09798    0.12335   -0.79
> qidItem4    -0.02294    0.12335   -0.19
> qidItem5     0.13196    0.12335    1.07
> qidItem6    -0.01915    0.12335   -0.16
> 
> Does this look like a reasonable approach? If so, how would I get the
> residuals out of this to identify examinee/qid combinations that seem
> unusually large?
> 
> Thanks in advance for any help.
> 
> The dataset I'm using is pasted below. (I made the qid variable effects
> coded by doing contrast(test.DF$qid) <- contr.sum but I'm not sure if
> the contrast attribute is included in the dput below. Also, I added one
> line at the bottom with dummy data to get it to run.)
> 
>  dput(test.DF)
> structure(list(examinee = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 
> 6L, 6L, 6L, 6L, 6L, 6L, 9L, 9L, 9L, 9L, 9L, 9L, 7L, 7L, 7L, 7L, 
> 7L, 7L, 96L, 96L, 96L, 96L, 96L, 96L, 8L, 8L, 8L, 8L, 8L, 8L, 
> 4L, 4L, 4L, 4L, 4L, 4L, 12L, 12L, 12L, 12L, 12L, 12L, 16L, 16L, 
> 16L, 16L, 16L, 16L, 10L, 10L, 10L, 10L, 10L, 10L, 19L, 19L, 19L, 
> 19L, 19L, 19L, 5L, 5L, 5L, 5L, 5L, 5L, 21L, 21L, 21L, 21L, 21L, 
> 21L, 18L, 18L, 18L, 18L, 18L, 18L, 99L, 99L, 99L, 99L, 99L, 99L, 
> 98L, 98L, 98L, 98L, 98L, 98L, 13L, 13L, 13L, 13L, 13L, 13L, 26L, 
> 26L, 26L, 26L, 26L, 26L, 1L, 1L, 1L, 1L, 1L, 1L, 29L, 29L, 29L, 
> 29L, 29L, 29L, 30L, 30L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 
> 31L, 31L, 32L, 32L, 32L, 32L, 32L, 32L, 23L, 23L, 23L, 23L, 23L, 
> 23L, 35L, 35L, 35L, 35L, 35L, 35L, 36L, 36L, 36L, 36L, 36L, 36L, 
> 37L, 37L, 37L, 37L, 37L, 37L, 38L, 38L, 38L, 38L, 38L, 38L, 100L, 
> 100L, 100L, 100L, 100L, 100L, 40L, 40L, 40L, 40L, 40L, 40L, 42L, 
> 42L, 42L, 42L, 42L, 42L, 34L, 34L, 34L, 34L, 34L, 34L, 46L, 46L, 
> 46L, 46L, 46L, 46L, 47L, 47L, 47L, 47L, 47L, 47L, 44L, 44L, 44L, 
> 44L, 44L, 44L, 15L, 15L, 15L, 15L, 15L, 15L, 52L, 52L, 52L, 52L, 
> 52L, 52L, 55L, 55L, 55L, 55L, 55L, 55L, 53L, 53L, 53L, 53L, 53L, 
> 53L, 39L, 39L, 39L, 39L, 39L, 39L, 51L, 51L, 51L, 51L, 51L, 51L, 
> 48L, 48L, 48L, 48L, 48L, 48L, 58L, 58L, 58L, 58L, 58L, 58L, 22L, 
> 22L, 22L, 22L, 22L, 22L, 33L, 33L, 33L, 33L, 33L, 33L, 60L, 60L, 
> 60L, 60L, 60L, 60L, 95L, 95L, 95L, 95L, 95L, 95L, 59L, 59L, 59L, 
> 59L, 59L, 59L, 56L, 56L, 56L, 56L, 56L, 56L, 63L, 63L, 63L, 63L, 
> 63L, 63L, 57L, 57L, 57L, 57L, 57L, 57L, 50L, 50L, 50L, 50L, 50L, 
> 50L, 62L, 62L, 62L, 62L, 62L, 62L, 25L, 25L, 25L, 25L, 25L, 25L, 
> 64L, 64L, 64L, 64L, 64L, 64L, 14L, 14L, 14L, 14L, 14L, 14L, 66L, 
> 66L, 66L, 66L, 66L, 66L, 61L, 61L, 61L, 61L, 61L, 61L, 68L, 68L, 
> 68L, 68L, 68L, 68L, 49L, 49L, 49L, 49L, 49L, 49L, 69L, 69L, 69L, 
> 69L, 69L, 69L, 41L, 41L, 41L, 41L, 41L, 41L, 54L, 54L, 54L, 54L, 
> 54L, 54L, 67L, 67L, 67L, 67L, 67L, 67L, 65L, 65L, 65L, 65L, 65L, 
> 65L, 70L, 70L, 70L, 70L, 70L, 70L, 43L, 43L, 43L, 43L, 43L, 43L, 
> 20L, 20L, 20L, 20L, 20L, 20L, 72L, 72L, 72L, 72L, 72L, 72L, 11L, 
> 11L, 11L, 11L, 11L, 11L, 97L, 97L, 97L, 97L, 97L, 97L, 74L, 74L, 
> 74L, 74L, 74L, 74L, 75L, 75L, 75L, 75L, 75L, 75L, 77L, 77L, 77L, 
> 77L, 77L, 77L, 76L, 76L, 76L, 76L, 76L, 76L, 79L, 79L, 79L, 79L, 
> 79L, 79L, 78L, 78L, 78L, 78L, 78L, 78L, 80L, 80L, 80L, 80L, 80L, 
> 80L, 81L, 81L, 81L, 81L, 81L, 81L, 71L, 71L, 71L, 71L, 71L, 71L, 
> 82L, 82L, 82L, 82L, 82L, 82L, 83L, 83L, 83L, 83L, 83L, 83L, 28L, 
> 28L, 28L, 28L, 28L, 28L, 84L, 84L, 84L, 84L, 84L, 84L, 89L, 89L, 
> 89L, 89L, 89L, 89L, 87L, 87L, 87L, 87L, 87L, 87L, 86L, 86L, 86L, 
> 86L, 86L, 86L, 90L, 90L, 90L, 90L, 90L, 90L, 91L, 91L, 91L, 91L, 
> 91L, 91L, 85L, 85L, 85L, 85L, 85L, 85L, 2L, 2L, 2L, 2L, 2L, 2L, 
> 92L, 92L, 92L, 92L, 92L, 92L, 17L, 17L, 17L, 17L, 17L, 17L, 24L, 
> 24L, 24L, 24L, 24L, 24L, 93L, 93L, 93L, 93L, 93L, 93L, 88L, 88L, 
> 88L, 88L, 88L, 88L, 73L, 73L, 73L, 73L, 73L, 73L, 27L, 27L, 27L, 
> 27L, 27L, 27L, 45L, 45L, 45L, 45L, 45L, 45L, 94L, 94L, 94L, 94L, 
> 94L, 94L, 100L, 100L, 100L, 100L, 100L, 100L), .Label = c("1", 
> "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
> "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
> "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
> "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
> "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", 
> "58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", 
> "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79", 
> "80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90", 
> "91", "92", "93", "94", "95", "96", "97", "98", "99", "100"), class =
> "factor"), 
>     qid = structure(c(3L, 5L, 1L, 2L, 4L, 6L, 5L, 2L, 4L, 6L, 
>     1L, 3L, 6L, 3L, 4L, 2L, 5L, 1L, 6L, 3L, 2L, 5L, 1L, 4L, 4L, 
>     3L, 1L, 6L, 5L, 2L, 3L, 2L, 5L, 6L, 4L, 1L, 2L, 1L, 3L, 5L, 
>     4L, 6L, 3L, 2L, 6L, 4L, 5L, 1L, 3L, 2L, 5L, 4L, 6L, 1L, 2L, 
>     6L, 5L, 1L, 4L, 3L, 3L, 6L, 5L, 4L, 2L, 1L, 4L, 2L, 3L, 5L, 
>     6L, 1L, 3L, 5L, 2L, 6L, 4L, 1L, 5L, 6L, 3L, 2L, 1L, 4L, 4L, 
>     2L, 3L, 5L, 1L, 6L, 3L, 6L, 2L, 1L, 5L, 4L, 1L, 6L, 5L, 3L, 
>     2L, 4L, 6L, 1L, 2L, 3L, 5L, 4L, 2L, 6L, 4L, 3L, 5L, 1L, 1L, 
>     4L, 3L, 5L, 6L, 2L, 4L, 2L, 6L, 5L, 3L, 1L, 5L, 2L, 6L, 3L, 
>     1L, 4L, 1L, 6L, 4L, 5L, 2L, 3L, 1L, 4L, 3L, 2L, 5L, 6L, 6L, 
>     2L, 1L, 3L, 4L, 5L, 1L, 6L, 3L, 4L, 5L, 2L, 6L, 3L, 5L, 1L, 
>     4L, 2L, 2L, 4L, 6L, 5L, 3L, 1L, 6L, 1L, 5L, 3L, 4L, 2L, 2L, 
>     1L, 3L, 4L, 6L, 5L, 3L, 6L, 1L, 5L, 4L, 2L, 2L, 4L, 5L, 6L, 
>     3L, 1L, 2L, 3L, 4L, 1L, 5L, 6L, 6L, 5L, 4L, 1L, 2L, 3L, 5L, 
>     4L, 1L, 3L, 2L, 6L, 6L, 1L, 4L, 5L, 3L, 2L, 2L, 6L, 5L, 3L, 
>     4L, 1L, 6L, 5L, 3L, 4L, 2L, 1L, 5L, 1L, 3L, 4L, 2L, 6L, 4L, 
>     3L, 2L, 6L, 1L, 5L, 2L, 3L, 5L, 1L, 4L, 6L, 6L, 5L, 2L, 3L, 
>     4L, 1L, 5L, 4L, 3L, 2L, 1L, 6L, 4L, 3L, 2L, 6L, 1L, 5L, 2L, 
>     3L, 5L, 6L, 4L, 1L, 4L, 3L, 1L, 5L, 6L, 2L, 3L, 6L, 2L, 4L, 
>     5L, 1L, 6L, 1L, 3L, 4L, 2L, 5L, 5L, 1L, 3L, 2L, 4L, 6L, 1L, 
>     6L, 3L, 4L, 2L, 5L, 1L, 6L, 3L, 5L, 2L, 4L, 5L, 4L, 6L, 2L, 
>     1L, 3L, 4L, 3L, 5L, 2L, 6L, 1L, 5L, 6L, 4L, 3L, 1L, 2L, 5L, 
>     4L, 3L, 2L, 1L, 6L, 3L, 6L, 2L, 5L, 4L, 1L, 3L, 6L, 4L, 5L, 
>     2L, 1L, 6L, 1L, 2L, 3L, 5L, 4L, 2L, 1L, 4L, 3L, 5L, 6L, 5L, 
>     2L, 4L, 3L, 6L, 1L, 4L, 5L, 3L, 1L, 2L, 6L, 2L, 1L, 4L, 6L, 
>     3L, 5L, 6L, 2L, 5L, 4L, 1L, 3L, 4L, 6L, 2L, 1L, 5L, 3L, 4L, 
>     6L, 3L, 5L, 1L, 2L, 6L, 3L, 1L, 5L, 2L, 4L, 5L, 6L, 1L, 4L, 
>     2L, 3L, 6L, 4L, 5L, 2L, 3L, 1L, 4L, 5L, 1L, 6L, 3L, 2L, 6L, 
>     1L, 4L, 2L, 5L, 3L, 3L, 1L, 5L, 4L, 2L, 6L, 4L, 6L, 1L, 2L, 
>     3L, 5L, 1L, 4L, 5L, 3L, 2L, 6L, 5L, 2L, 1L, 6L, 3L, 4L, 6L, 
>     1L, 2L, 3L, 4L, 5L, 3L, 1L, 5L, 2L, 6L, 4L, 3L, 2L, 1L, 5L, 
>     6L, 4L, 5L, 4L, 6L, 1L, 2L, 3L, 4L, 5L, 3L, 2L, 1L, 6L, 4L, 
>     3L, 6L, 2L, 1L, 5L, 5L, 4L, 3L, 2L, 6L, 1L, 2L, 6L, 4L, 1L, 
>     5L, 3L, 3L, 4L, 6L, 2L, 5L, 1L, 5L, 3L, 4L, 2L, 1L, 6L, 2L, 
>     4L, 5L, 6L, 1L, 3L, 6L, 3L, 4L, 5L, 1L, 2L, 1L, 5L, 2L, 4L, 
>     6L, 3L, 6L, 5L, 1L, 3L, 4L, 2L, 6L, 3L, 4L, 1L, 2L, 5L, 5L, 
>     6L, 4L, 2L, 3L, 1L, 1L, 3L, 4L, 2L, 5L, 6L, 5L, 3L, 2L, 6L, 
>     4L, 1L, 1L, 5L, 4L, 2L, 3L, 6L, 2L, 5L, 1L, 4L, 6L, 3L, 4L, 
>     1L, 2L, 5L, 6L, 3L, 6L, 5L, 4L, 3L, 2L, 1L, 1L, 4L, 3L, 5L, 
>     6L, 2L, 5L, 3L, 1L, 4L, 2L, 6L, 2L, 6L, 4L, 3L, 5L, 1L, 2L, 
>     6L, 4L, 5L, 1L, 3L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("Item1", 
>     "Item2", "Item3", "Item4", "Item5", "Item6"), class = "factor"), 
>     answer_time = c(16, 11, 29, 19, 51, 23, 17, 28, 36, 57, 23, 
>     20, 26, 29, 90, 13, 43, 41, 40, 90, 63, 56, 54, 54, 1, 27, 
>     35, 90, 90, 32, 13, 12, 57, 24, 56, 18, 33, 61, 34, 36, 47, 
>     38, 90, 67, 21, 74, 81, 71, 28, 40, 22, 22, 26, 69, 77, 69, 
>     35, 76, 55, 24, 90, 42, 44, 16, 22, 39, 1, 32, 72, 90, 28, 
>     54, 1, 56, 51, 40, 11, 29, 64, 32, 62, 50, 19, 19, 90, 26, 
>     36, 16, 22, 14, 1, 49, 53, 88, 48, 54, 60, 28, 33, 58, 15, 
>     22, 44, 47, 10, 71, 75, 60, 39, 28, 31, 17, 61, 42, 1, 56, 
>     76, 39, 28, 26, 32, 90, 19, 90, 63, 41, 90, 57, 21, 45, 52, 
>     36, 1, 55, 62, 60, 83, 58, 90, 90, 83, 30, 60, 77, 54, 18, 
>     42, 66, 26, 69, 15, 41, 27, 12, 34, 18, 61, 56, 49, 56, 43, 
>     34, 85, 90, 31, 73, 65, 83, 1, 90, 59, 22, 90, 90, 28, 46, 
>     90, 17, 47, 42, 53, 25, 35, 47, 19, 31, 49, 72, 73, 34, 75, 
>     63, 43, 30, 10, 14, 41, 32, 90, 90, 56, 68, 32, 10, 90, 69, 
>     43, 11, 45, 49, 90, 61, 72, 57, 70, 77, 6, 1, 2, 2, 1, 2, 
>     90, 64, 75, 18, 22, 24, 66, 23, 45, 67, 49, 55, 14, 20, 9, 
>     11, 9, 17, 1, 25, 21, 34, 90, 32, 90, 71, 38, 34, 18, 36, 
>     35, 37, 34, 22, 30, 21, 44, 34, 58, 15, 32, 23, 45, 90, 56, 
>     43, 41, 42, 17, 40, 90, 90, 20, 40, 75, 25, 35, 42, 31, 48, 
>     28, 51, 29, 31, 12, 90, 21, 43, 16, 63, 35, 23, 23, 25, 16, 
>     23, 18, 14, 58, 19, 22, 54, 37, 52, 90, 71, 21, 72, 85, 76, 
>     71, 13, 53, 14, 43, 68, 76, 28, 38, 33, 13, 13, 50, 27, 48, 
>     21, 36, 28, 1, 32, 10, 68, 12, 21, 90, 22, 77, 34, 35, 39, 
>     64, 55, 42, 82, 88, 90, 33, 18, 85, 49, 23, 33, 1, 55, 42, 
>     19, 36, 90, 39, 32, 6, 29, 36, 25, 1, 24, 20, 24, 15, 28, 
>     90, 24, 13, 35, 19, 13, 82, 56, 43, 30, 74, 74, 90, 77, 12, 
>     34, 41, 77, 90, 90, 53, 64, 38, 90, 25, 40, 55, 69, 18, 16, 
>     53, 49, 82, 28, 73, 46, 72, 76, 53, 66, 73, 53, 37, 28, 39, 
>     90, 48, 21, 90, 75, 77, 65, 61, 18, 90, 26, 29, 22, 51, 76, 
>     1, 31, 28, 74, 29, 21, 90, 62, 43, 42, 28, 58, 44, 36, 29, 
>     50, 21, 90, 28, 19, 18, 21, 12, 19, 1, 48, 59, 62, 49, 1, 
>     26, 32, 27, 18, 16, 15, 37, 48, 24, 27, 30, 42, 68, 38, 35, 
>     90, 66, 73, 1, 19, 90, 56, 21, 17, 65, 35, 41, 64, 38, 25, 
>     90, 25, 57, 41, 63, 71, 1, 41, 25, 17, 47, 48, 28, 69, 31, 
>     31, 22, 59, 86, 25, 21, 52, 19, 32, 51, 43, 22, 33, 90, 31, 
>     88, 63, 70, 71, 76, 74, 13, 27, 9, 21, 12, 15, 76, 17, 36, 
>     19, 6, 51, 71, 77, 67, 32, 74, 14, 1, 90, 18, 26, 50, 41, 
>     69, 58, 22, 62, 10, 40, 15, 8, 14, 7, 16, 5, 1, 54, 90, 25, 
>     29, 41, 33, 40, 36, 30, 24, 63, 1, 44, 16, 13, 40, 20, 90, 
>     21, 34, 10, 32, 14, 90, 33, 90, 11, 34, 76, 1, 77, 77, 82, 
>     32, 90, 1, 1, 1, 1, 1, 1)), row.names = c(NA, 606L), .Names =
> c("examinee", 
> "qid", "answer_time"), class = "data.frame")
>>  
>