[R] Oddity using multcompView package

Thu Oct 30 09:50:00 CET 2014

The multcompView has some useful features but I'm sure this isn't
intentional.  Excuse the size.  This is about the smallest
reproducible example I can do:

require("multcompView")

> mm <- structure(list(TempNom = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L), .Label = c("15", "18", "21", "22", "24", "27"), class = "factor"), 
    Days = c(14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 
    14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 
    16L, 16L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 19L, 
    19L, 19L, 19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L, 21L, 
    22L, 23L, 23L, 25L, 25L, 26L, 26L, 27L, 27L, 27L, 28L, 10L, 
    10L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 
    13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 
    14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 
    16L, 16L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 
    19L, 20L, 20L, 20L, 20L, 21L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
    8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 
    9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 
    11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 
    12L, 12L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 
    16L, 16L, 17L, 17L, 17L, 18L, 20L, 8L, 8L, 8L, 8L, 9L, 9L, 
    9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 
    10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 
    11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 
    12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 
    15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 17L, 17L, 18L, 
    18L, 18L, 19L, 19L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
    8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 
    9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 
    11L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 14L, 14L, 14L, 15L, 
    16L, 16L, 17L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
    7L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 
    9L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 12L, 12L, 12L, 12L, 
    12L, 13L, 13L, 16L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -343L), .Names = c("TempNom", "Days"))
> mm.aov <- aov(Days ~ TempNom, data = mm)
> model.tables(mm.aov, "means")
Tables of means
Grand mean

12.74052 

 TempNom 
      15    18    21    22    24     27
    18.4 14.87 11.22 11.92 10.27  9.095
rep 57.0 55.00 65.00 72.00 52.00 42.000
> multcompLetters(TukeyHSD(mm.aov)$TempNom[,4])
  18   21   22   24   27   15 
 "a" "bc"  "b" "cd"  "d"  "e" 
> 

Something is clearly wrong with letter "e" for TempNom = 15.

Theory1:
First, I tried reordering the output by name, but that results in "e"
for the longest (i.e. 15) and "a" for the runner-up.

Theory2:
Then I tried assuming the letters were correct and in the correct
order but were labelled incorrectly.

If I put the output into a dataframe, it's easier to see what we have.

> TT <- data.frame(T = c(model.tables(mm.aov, "means")[[1]]$TempNom))
> TT$Group <- multcompLetters(TukeyHSD(mm.aov)$TempNom[,4])$
monospacedLetters[rownames(TT)]# (sorted a la Theory1)
> TT$GroupA <- multcompLetters(TukeyHSD(mm.aov)$TempNom[,4])$
monospacedLetters # (unsorted a la Theory2)
> TT
           T Group GroupA
15 18.403509     e  a    
18 14.872727 a       bc  
21 11.215385  bc     b   
22 11.916667  b       cd 
24 10.269231   cd      d 
27  9.095238    d       e
> 

In this example, it appears GroupA (Theory2) makes more sense.
However, in larger examples, that approach can result in two values in
separate groups even though there's less than 0.1% difference between
them and sustantial standard error.

In those cases (too large to show here) it makes more sense to modify
the Group column by changing the "e" to "a" and moving the other
letters along by 1.

The code for the multcompLetters function is nicely commented but
before I launch into checking it for bugs, I thought it prudent to ask
if anyone else had encountered anything similar.  Or am I simply
asking too much of an unbalanced data set?

> sessionInfo()

R version 3.1.1 (2014-07-10)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      grDevices utils     stats     graphics  methods   base     

other attached packages:
[1] dplyr_0.3.0.2      multcompView_0.1-5 lattice_0.20-29   

loaded via a namespace (and not attached):
[1] assertthat_0.1 DBI_0.3.1      lazyeval_0.1.9 magrittr_1.0.1 parallel_3.1.1
[6] Rcpp_0.11.3    tools_3.1.1   
> 

TIA

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___    Patrick Connolly   
 {~._.~}                   Great minds discuss ideas    
 _( Y )_  	         Average minds discuss events 
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)  	                      ..... Eleanor Roosevelt

~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

PS: the problem existed without dplyr and DBI