[R] Oddity using multcompView package
Patrick Connolly
p_connolly at slingshot.co.nz
Thu Oct 30 09:50:00 CET 2014
The multcompView has some useful features but I'm sure this isn't
intentional. Excuse the size. This is about the smallest
reproducible example I can do:
require("multcompView")
> mm <- structure(list(TempNom = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L), .Label = c("15", "18", "21", "22", "24", "27"), class = "factor"),
Days = c(14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L,
16L, 16L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 19L,
19L, 19L, 19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L, 21L,
22L, 23L, 23L, 25L, 25L, 26L, 26L, 27L, 27L, 27L, 28L, 10L,
10L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L,
14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
16L, 16L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L,
19L, 20L, 20L, 20L, 20L, 21L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L,
12L, 12L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L,
16L, 16L, 17L, 17L, 17L, 18L, 20L, 8L, 8L, 8L, 8L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L,
12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 17L, 17L, 18L,
18L, 18L, 19L, 19L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L,
11L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 14L, 14L, 14L, 15L,
16L, 16L, 17L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 12L, 12L, 12L, 12L,
12L, 13L, 13L, 16L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -343L), .Names = c("TempNom", "Days"))
> mm.aov <- aov(Days ~ TempNom, data = mm)
> model.tables(mm.aov, "means")
Tables of means
Grand mean
12.74052
TempNom
15 18 21 22 24 27
18.4 14.87 11.22 11.92 10.27 9.095
rep 57.0 55.00 65.00 72.00 52.00 42.000
> multcompLetters(TukeyHSD(mm.aov)$TempNom[,4])
18 21 22 24 27 15
"a" "bc" "b" "cd" "d" "e"
>
Something is clearly wrong with letter "e" for TempNom = 15.
Theory1:
First, I tried reordering the output by name, but that results in "e"
for the longest (i.e. 15) and "a" for the runner-up.
Theory2:
Then I tried assuming the letters were correct and in the correct
order but were labelled incorrectly.
If I put the output into a dataframe, it's easier to see what we have.
> TT <- data.frame(T = c(model.tables(mm.aov, "means")[[1]]$TempNom))
> TT$Group <- multcompLetters(TukeyHSD(mm.aov)$TempNom[,4])$
monospacedLetters[rownames(TT)]# (sorted a la Theory1)
> TT$GroupA <- multcompLetters(TukeyHSD(mm.aov)$TempNom[,4])$
monospacedLetters # (unsorted a la Theory2)
> TT
T Group GroupA
15 18.403509 e a
18 14.872727 a bc
21 11.215385 bc b
22 11.916667 b cd
24 10.269231 cd d
27 9.095238 d e
>
In this example, it appears GroupA (Theory2) makes more sense.
However, in larger examples, that approach can result in two values in
separate groups even though there's less than 0.1% difference between
them and sustantial standard error.
In those cases (too large to show here) it makes more sense to modify
the Group column by changing the "e" to "a" and moving the other
letters along by 1.
The code for the multcompLetters function is nicely commented but
before I launch into checking it for bugs, I thought it prudent to ask
if anyone else had encountered anything similar. Or am I simply
asking too much of an unbalanced data set?
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid grDevices utils stats graphics methods base
other attached packages:
[1] dplyr_0.3.0.2 multcompView_0.1-5 lattice_0.20-29
loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1 lazyeval_0.1.9 magrittr_1.0.1 parallel_3.1.1
[6] Rcpp_0.11.3 tools_3.1.1
>
TIA
--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
___ Patrick Connolly
{~._.~} Great minds discuss ideas
_( Y )_ Average minds discuss events
(:_~*~_:) Small minds discuss people
(_)-(_) ..... Eleanor Roosevelt
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
PS: the problem existed without dplyr and DBI
More information about the R-help
mailing list