[R-meta] Pairwise comparisons between the factor levels and compact letter display (cld)
Jan Kardela (PGR)
J@H@K@rdel@2 @ending from newc@@tle@@c@uk
Wed Sep 12 14:07:16 CEST 2018
Dear Wolfgang,
I apologize for the late reply but I was away for few weeks.
Please find below the working code. The function cld.summary.glht.modified is the modified original cld function that takes as an argument list (ret) with: vector of effect sizes, sample names, contrast matrix and p-values.
This list is then passed to the main cld algorithm: insert_absorb.
Obviously, the original cld package offers additional functionality such as plotting those letters on the box plot (plot.cld) but this requires some additional fiddling with the code.
Thank you again for the help and for providing this great metafor package!
Regards,
Jan
#### ==== BEGINNING OF CODE ==== ####
library(metafor)
library(multcomp)
df<-data.frame(Study=c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),
Sample=c("A (c)", "A (c)", "A (c)","A (c)", "A (c)", "B","B","B","B","B","C","C","C","C","C","D","D","D","D","D"),
Experiment=c(1:20),
mean1=c(19.87,59.91,75.73,16.78,83.1,4.92,5.37,23.68,23.68,9.68,0.01,0.01,0.01,0.01,8,12.11,86.67,63.45,37.87,67),
stddev1=c(15.03,13.8,2.89,6.05,11.66,5,5.56,11.16,3.72,8.75,0.01,0.01,0.01,0.01,9.23,13.28,7.64,2.19,1.42,23.86),
n1=c(3,3,2,2,8,3,3,2,2,8,3,3,2,2,8,3,3,2,2,8),
mean2=c(19.87,59.91,75.73,16.78,83.1,19.87,59.91,75.73,16.78,83.1,19.87,59.91,75.73,16.78,83.1,19.87,59.91,75.73,16.78,83.1),
stddev2=c(15.03,13.8,2.89,6.05,11.66,15.03,13.8,2.89,6.05,11.66,15.03,13.8,2.89,6.05,11.66,15.03,13.8,2.89,6.05,11.66),
n2=c(3,3,2,2,8,3,3,2,2,8,3,3,2,2,8,3,3,2,2,8),
Type=c("NI","NI","NI","NI","NI","I","I","I","I","I","I","I","I","I","I","NI","NI","NI","NI","NI"))
df
escalc.dat<-escalc(m1i=mean1,m2i=mean2,n1i=n1,n2i=n2,sd1i=stddev1,sd2i=stddev2,measure="MD",method="REML",data=df)
escalc.dat
rma.meta<-rma.mv(yi,vi,mods=~factor(Sample)-1,random=~Sample|Experiment,struct="DIAG",data=escalc.dat,method="REML",digits=3)
rma.meta
CMatrix<-cbind(contrMat(c("A (c)"=1,"B"=1,"C"=1,"D"=1), type = "Tukey"))
tukey_glht<-glht(rma.meta, linfct=CMatrix, test=adjusted("Bonferroni"))
summary_tukey<-summary(tukey_glht)
comps <- cbind(apply(CMatrix, 1, function(k) levels(escalc.dat$Sample)[k == 1]),
apply(CMatrix, 1, function(k) levels(escalc.dat$Sample)[k == -1]))
ret1<-list(y=as.vector(escalc.dat$yi), x = escalc.dat$Sample, comps = comps, signif_ret = as.vector(summary_tukey$test$pvalues))
ret1
cld.summary.glht.modified <- function(ret, level = 0.05, decreasing = FALSE, ...) {
signif <- ret$signif_ret < 0.05
# Order the levels according to its mean
# Tidy up: ret$y[1:length(ret$x)]], cox models concatenates a vector of live/dead
# I think this way is easier than to deal with gsub later and it's more general
lvl_order <- levels(ret$x)[order(tapply(as.numeric(ret$y)[1:length(ret$x)], ret$x, mean))]
# names(signif) <- gsub("\\s", "", rownames(object$linfct))
ret$signif <- signif
ret$mcletters <- insert_absorb(signif, decreasing = decreasing,
comps = ret$comps, lvl_order = lvl_order)
# start edit
ret$mcletters$Letters <- ret$mcletters$Letters[levels(ret$x)]
ret$mcletters$monospacedLetters <- ret$mcletters$monospacedLetters[levels(ret$x)]
ret$mcletters$LetterMatrix <- ret$mcletters$LetterMatrix[levels(ret$x),]
# end edit
class(ret) <- "cld"
ret
}
# Function implements the insert-absorb (sweep) heuristic of Piepho 2004:
# "An Algorithm for a Letter-Based Representation of All-Pairwise Comparisons"
#
# x ... vector of logicals indicating significant comparisons with hyphenated
# names e.g. A-B, treatmentA-treatmentB, ...
# Letters ... a set of user defined letters { default is Letters=c(letters, LETTERS) }
# separator ... a separating character used to produce a sufficiently large set of
# characters for a compact letter display (default is separator=".") in case
# the number of letters required exceeds the number of letters available
# Decreasing ... Inverse the order of the letters
insert_absorb <- function( x, Letters=c(letters, LETTERS), separator=".", decreasing = FALSE,
comps = NULL, lvl_order){
obj_x <- deparse(substitute(x))
if (is.null(comps)) {
namx <- names(x)
namx <- gsub(" ", "", names(x))
if(length(namx) != length(x))
stop("Names required for ", obj_x)
split_names <- strsplit(namx, "-")
stopifnot( sapply(split_names, length) == 2 )
comps <- t(as.matrix(as.data.frame(split_names)))
}
rownames(comps) <- names(x)
lvls <- lvl_order
n <- length(lvls)
lmat <- array(TRUE, dim=c(n,1), dimnames=list(lvls, NULL) )
if( sum(x) == 0 ){ # no differences
ltrs <- rep(get_letters(1, Letters=Letters, separator=separator), length(lvls) )
names(ltrs) <- lvls
colnames(lmat) <- ltrs[1]
msl <- ltrs
ret <- list(Letters=ltrs, monospacedLetters=msl, LetterMatrix=lmat)
class(ret) <- "multcompLetters"
return(ret)
}
else{
signifs <- comps[x,,drop=FALSE]
absorb <- function(m){
for(j in 1:(ncol(m)-1)){
for(k in (j+1):ncol(m)){
if( all(m[which(m[,k]),k] & m[which(m[,k]),j]) ){ # column k fully contained in column j
m <- m[,-k, drop=FALSE]
return(absorb(m))
}
else if( all(m[which(m[,j]),k] & m[which(m[,j]),j]) ){ # column j fully contained in column k
m <- m[,-j, drop=FALSE]
return(absorb(m))
}
}
}
return(m)
}
for( i in 1:nrow(signifs) ){ # insert
tmpcomp <- signifs[i,]
wassert <- which(lmat[tmpcomp[1],] & lmat[tmpcomp[2],]) # which columns wrongly assert nonsignificance
if(any(wassert)){
tmpcols <- lmat[,wassert,drop=FALSE]
tmpcols[tmpcomp[2],] <- FALSE
lmat[tmpcomp[1],wassert] <- FALSE
lmat <- cbind(lmat, tmpcols)
colnames(lmat) <- get_letters( ncol(lmat), Letters=Letters,
separator=separator)
if(ncol(lmat) > 1){ # absorb columns if possible
lmat <- absorb(lmat)
colnames(lmat) <- get_letters( ncol(lmat), Letters=Letters,
separator=separator )
}
}
}
}
lmat <- lmat[,order(apply(lmat, 2, sum))]
lmat <- sweepLetters(lmat) # 1st
lmat <- lmat[,names(sort(apply(lmat,2, function(x) return(min(which(x))))))] # reorder columns
colnames(lmat) <- get_letters( ncol(lmat), Letters=Letters,
separator=separator)
lmat <- lmat[,order(apply(lmat, 2, sum))] # 2nd sweep
lmat <- sweepLetters(lmat)
lmat <- lmat[,names(sort(apply(lmat,2, function(x) return(min(which(x)))),
decreasing = decreasing))] # reorder columns
colnames(lmat) <- get_letters( ncol(lmat), Letters=Letters,
separator=separator)
ltrs <- apply(lmat,1,function(x) return(paste(names(x)[which(x)], sep="", collapse="") ) )
msl <- matrix(ncol=ncol(lmat), nrow=nrow(lmat)) # prepare monospaced letters
for( i in 1:nrow(lmat) ){
msl[i,which(lmat[i,])] <- colnames(lmat)[which(lmat[i,])]
absent <- which(!lmat[i,])
if( length(absent) < 2 ){
if( length(absent) == 0 )
next
else{
msl[i,absent] <- paste( rep(" ", nchar(colnames(lmat)[absent])), collapse="" )
}
}
else{
msl[i,absent] <- unlist( lapply( sapply( nchar(colnames(lmat)[absent]),
function(x) return(rep( " ",x)) ),
paste, collapse="") )
}
}
msl <- apply(msl, 1, paste, collapse="")
names(msl) <- rownames(lmat)
ret <- list( Letters=ltrs, monospacedLetters=msl, LetterMatrix=lmat,
aLetters = Letters, aseparator = separator )
class(ret) <- "multcompLetters"
return(ret)
}
# All redundant letters are swept out without altering the information within a LetterMatrix.
#
# mat ... a LetterMatrix as produced by function insert_absorb()
# start.col ... either a single integer specifying the column to start with or a vector
# of max. length equal to ncol(mat) specifying the column order to be used.
# Letters ... a set of user defined letters { default is Letters=c(letters, LETTERS) }
# separator ... a separating character used to produce a sufficiently large set of
# characters for a compact letter display (default is separator=".") in case
# the number of letters required exceeds the number of letters available
sweepLetters <- function(mat, start.col=1, Letters=c(letters, LETTERS), separator="."){
stopifnot( all(start.col %in% 1:ncol(mat)) )
locked <- matrix(rep(0,ncol(mat)*nrow(mat)), ncol=ncol(mat)) # 1 indicates that another letter dependes on this entry
cols <- 1:ncol(mat)
cols <- cols[c( start.col, cols[-start.col] )]
if( any(is.na(cols) ) )
cols <- cols[-which(is.na(cols))]
for( i in cols){
tmp <- matrix(rep(0,ncol(mat)*nrow(mat)), ncol=ncol(mat))
tmp[which(mat[,i]),] <- mat[which(mat[,i]),] # get items of those rows which are TRUE in col "i"
one <- which(tmp[,i]==1)
if( all(apply(tmp[,-i,drop=FALSE], 1, function(x) return( any(x==1) ))) ){ # there is at least one row "l" where mat[l,i] is the only item which is TRUE i.e. no item can be removed in this column
next
}
for( j in one ){ # over all 1's
if( locked[j,i] == 1 ){ # item is locked
next
}
chck <- 0
lck <- list()
for( k in one ){
if( j==k ){
next
}
else{ # pair j-k
rows <- tmp[c(j,k),]
dbl <- rows[1,] & rows[2,]
hit <- which(dbl)
hit <- hit[-which(hit==i)]
dbl <- rows[1,-i,drop=FALSE] & rows[2,-i,drop=FALSE]
if( any(dbl) ){
chck <- chck + 1
lck[[chck]] <- list(c(j,hit[length(hit)]), c(k,hit[length(hit)])) # record items which have to be locked, use last column if multiple hits
}
}
}
if( (chck == (length(one)-1)) && chck != 0 ){ # item is redundant
for( k in 1:length(lck) ){ # lock items
locked[ lck[[k]][[1]][1], lck[[k]][[1]][2] ] <- 1
locked[ lck[[k]][[2]][1], lck[[k]][[2]][2] ] <- 1
}
mat[j,i] <- FALSE # delete redundant entry
}
}
if(all(mat[,i]==FALSE)){ # delete column where each entry is FALSE and restart
mat <- mat[,-i,drop=FALSE]
colnames(mat) <- get_letters( ncol(mat), Letters=Letters, separator=separator)
return(sweepLetters(mat, Letters=Letters, separator=separator))
}
}
onlyF <- apply(mat, 2, function(x) return(all(!x)))
if( any(onlyF) ){ # There are columns with just FALSE entries
mat <- mat[,-which(onlyF),drop=FALSE]
colnames(mat) <- get_letters( ncol(mat), Letters=Letters, separator=separator)
}
return( mat )
}
# Create a set of letters for a letter display. If "n" exceeds the number of letters
# specified in "Letters", they are recycled with one or more separating character(s)
# preceding each recycled letter.
# e.g. get_letters(10, Letters=letters[1:4]) produces: "a" "b" "c" "d" ".a" ".b" ".c" ".d" "..a" "..b"
#
# n ... number of letters
# Letters ... the set of characters to be used
# separator ... a character to be used as separator e.g.
# n=5, Letters=c("a","b") => "a", "b", ".a", ".b", "..a"
get_letters <- function( n, Letters=c(letters, LETTERS), separator="." ){
n.complete <- floor(n / length(Letters)) # number of complete sets of Letters
n.partial <- n %% length(Letters) # number of additional Letters
lett <- character()
separ=""
if( n.complete > 0 ){
for( i in 1:n.complete ){
lett <- c(lett, paste(separ, Letters, sep="") )
separ <- paste( separ, separator, sep="" )
}
}
if(n.partial > 0 )
lett <- c(lett, paste(separ, Letters[1:n.partial], sep="") )
return(lett)
}
cld.summary.glht.modified(ret1)
plot(tukey_glht)
#### ==== END OF CODE ==== ####
-----Original Message-----
From: Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl>
Sent: Wednesday, August 22, 2018 10:50 AM
To: Jan Kardela (PGR) <J.H.Kardela2 using newcastle.ac.uk>; r-sig-meta-analysis using r-project.org
Subject: RE: [R-meta] Pairwise comparisons between the factor levels and compact letter display (cld)
Thanks for the feedback!
I certainly would be interested in seeing what you did to make this work. I assume you mean that you had to modify the cld() function.
Best,
Wolfgang
-----Original Message-----
From: Jan Kardela (PGR) [mailto:J.H.Kardela2 using newcastle.ac.uk]
Sent: Friday, 17 August, 2018 16:52
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis using r-project.org
Subject: RE: [R-meta] Pairwise comparisons between the factor levels and compact letter display (cld)
Thanks for the message Wolfgang and for the explanation of the error - it actually helped me to solve the problem.
I managed to get the code running after a a bit of 'reverse-engineering'. Basically, as the cld function requires a model object I had to change original code so it could accept a list with data necessary to run the main algorithm. I am happy to present the running code but it's a bit lengthy, so maybe it's better if I email it directly to interested users (?).
Kind regards,
Jan
-----Original Message-----
From: Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl>
Sent: Thursday, August 16, 2018 9:07 AM
To: Jan Kardela (PGR) <J.H.Kardela2 using newcastle.ac.uk>; r-sig-meta-analysis using r-project.org
Subject: RE: [R-meta] Pairwise comparisons between the factor levels and compact letter display (cld)
Thanks. I actually get a different error:
Error in terms.default(formula, data = data) :
no terms component nor attribute
Make sure you have the 'devel' version of metafor installed:
https://github.com/wviechtb/metafor#installation
Not that this fixes this problem, but it is the more up-to-date version.
Indeed, 'rma' objects do not contain a 'terms' component (for reasons that are a bit difficult to explain here). Since cld() apparently needs this, it looks like cld() cannot currently be used in combination with results from metafor.
Best,
Wolfgang
-----Original Message-----
From: Jan Kardela (PGR) [mailto:J.H.Kardela2 using newcastle.ac.uk]
Sent: Wednesday, 15 August, 2018 18:37
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis using r-project.org
Subject: RE: [R-meta] Pairwise comparisons between the factor levels and compact letter display (cld)
Dear Wolfgang,
Thank you for your message.
Below I provide the snippet of my data together with the code I had been using to calculate all the statistics.
Maybe just a little explanation of my data set as it might seem strange at first. I have run 5 independent settlement studies each using different populations of larvae. In each study there are 4 different samples (A (c), B, C, D), each consisting of few replicates (n1). Every single settlement experiment was conducted using different set of "participants" i.e. larvae. As a control group for each experiment I selected sample A (c) from appropriate Study. In case of sample C (experiments 11,12,13,14) there was no settlement so I decided to put the 0.01 values as for mean and stddev.
In order to merge the results from my studies I conducted meta-regression analysis with the Sample as a categorical moderator and used random switch to allow the amount of residual heterogeneity to be different in each Sample subgroup.
As I mentioned earlier I wanted to run some sort of Tukey post-hoc test on the meta-regression model to determine differences between the Sample levels. In order to do that I ran glht() which gave me all pairwise comparisons, but with so many pairs (my full data set consists of 11 different samples/levels not 4 as in snippet) I wanted to use compact letter display notification to more clearly present the differences between the samples. Obviously, I could use the plot function to graphically display all pairs of groups but in my case some of the groups overlap with others making the story not very clear.
Thank you again for your time and expertise!
Best regards,
Jan
#### ==== BEGINNING OF CODE ==== ####
library(metafor)
library(multcomp)
df<-data.frame(Study=c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),
Sample=c("A (c)", "A (c)", "A (c)","A (c)", "A (c)", "B","B","B","B","B","C","C","C","C","C","D","D","D","D","D"),
Experiment=c(1:20),
mean1=c(19.87,59.91,75.73,16.78,83.1,4.92,5.37,23.68,23.68,9.68,0.01,0.01,0.01,0.01,8,12.11,86.67,63.45,37.87,67),
stddev1=c(15.03,13.8,2.89,6.05,11.66,5,5.56,11.16,3.72,8.75,0.01,0.01,0.01,0.01,9.23,13.28,7.64,2.19,1.42,23.86),
n1=c(3,3,2,2,8,3,3,2,2,8,3,3,2,2,8,3,3,2,2,8),
mean2=c(19.87,59.91,75.73,16.78,83.1,19.87,59.91,75.73,16.78,83.1,19.87,59.91,75.73,16.78,83.1,19.87,59.91,75.73,16.78,83.1),
stddev2=c(15.03,13.8,2.89,6.05,11.66,15.03,13.8,2.89,6.05,11.66,15.03,13.8,2.89,6.05,11.66,15.03,13.8,2.89,6.05,11.66),
n2=c(3,3,2,2,8,3,3,2,2,8,3,3,2,2,8,3,3,2,2,8),
Type=c("NI","NI","NI","NI","NI","I","I","I","I","I","I","I","I","I","I","NI","NI","NI","NI","NI"))
df
escalc.dat<-escalc(m1i=mean1,m2i=mean2,n1i=n1,n2i=n2,sd1i=stddev1,sd2i=stddev2,measure="MD",method="REML",data=df)
escalc.dat
rma.meta<-rma.mv(yi,vi,mods=~factor(Sample)-1,random=~Sample|Experiment,struct="DIAG",data=escalc.dat,method="REML",digits=3)
rma.meta
tukey_glht<-glht(rma.meta, linfct=cbind(contrMat(c("A (c)"=1,"B"=1,"C"=1,"D"=1),type="Tukey")),test=adjusted("Bonferroni"))
summary(tukey_glht)
plot(tukey_glht)
cld(tukey_glht) ## That's where I get the error: "Error in formula.default(object, env = baseenv()) : invalid formula"
#### ====END OF CODE==== ####
-----Original Message-----
From: Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl>
Sent: Tuesday, August 14, 2018 9:09 PM
To: Jan Kardela (PGR) <J.H.Kardela2 using newcastle.ac.uk>; r-sig-meta-analysis using r-project.org
Subject: RE: [R-meta] Pairwise comparisons between the factor levels and compact letter display (cld)
Hi Jan,
Can you provide a small reproducible example that yields this error? I have never tried to see how well metafor objects play together with cld().
Best,
Wolfgang
-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Jan Kardela (PGR)
Sent: Monday, 13 August, 2018 12:01
To: r-sig-meta-analysis using r-project.org
Subject: [R-meta] Pairwise comparisons between the factor levels and compact letter display (cld)
Hello.
I am doing a multilevel meta-analysis in metafor and I am running a glht function from multcomp package to determine all-pairwise comparisons. I use contrMat function with Tukey switch to obtain the matrix contrast. As my categorical factor has 11 levels I would like to have some sort of compact letter display notification (cld) to more efficiently report the differences between all pairs. When I try to feed the glht model to the cld function I get an error "Error in formula.default(object, env = baseenv()) : invalid formula". Could someone please point me in the right direction and provide some advice?
Thank you in advance for all the help!
Best regards,
Jan
More information about the R-sig-meta-analysis
mailing list