[R] Tukey HSD

Thu Feb 9 22:33:36 CET 2012

On Feb 9, 2012, at 20:28 , jlpugh wrote:

> Hey Folks: 
> 
> New to R and learning as I go, but really, I know just enough to get myself
> into trouble. I've waded through everything up till now, and don't see
> anything in the search that is directly helpful for the POS that I've
> created.
> 
> The GOAL: All I want in the world is a program that performs 1-way ANOVA's
> on every column in a data set (taking the first column as the definition of
> the groups) and then spits out ONLY those results that were significant (p
> <= whatever I want), with their respective plots and TukeyHSD results.
> Sounds simple, right?

Quite likely, this could all be done more elegantly (lapply springs to mind), and there could be something in there that won't work at all, but to point out the glaring obvious: If the grouping factor is in column 1, then shouldn't you have var2~var1 in the aov() call?

> 	var1 <- mydf[,1]
> 	var2 <- mydf[,i]
> 	fm1 <- aov(var1 ~ var2)
> 	tky <- TukeyHSD(fm1)

and, by the way, you _really_ do not want to go via capture.output to get at the p value. It is accessible from the summary(fm1) directly. For aov objects, it's like:

summary(fm1)[[1]]$"Pr(>F)"[1]

(for "lm" objects, try anova(fm)$"Pr(>F)"[1])

> 
> Data:
> 
> http://r.789695.n4.nabble.com/file/n4374072/test_text.txt test_text.txt 
> 
> (see upload)
> 
> CODE:  ( which occurs after TEST <- read.table("test_text.txt") )
> 
> i <- 2;
> sink (file = "test_output.txt", append = FALSE)
> mydf <- data.frame(TEST)
> for (j in 1:ncol(mydf)-1) {
> 	var1 <- mydf[,1]
> 	var2 <- mydf[,i]
> 	fm1 <- aov(var1 ~ var2)
> 	tky <- TukeyHSD(fm1)
> 	otpt <- capture.output(summary(fm1))
> 	i <- i+1;
> 	lines <- as.vector(unlist(strsplit(otpt[2]," ")),mode="list") # gets the
> p-value
> 	if (grepl("[1234567890]",lines[14],perl = TRUE)) { #make sure that slot for
> p-value has a number
> 		number <- as.numeric(lines[14]) # make it numeric for logic test
> 		if (number <= 0.1) {
> 			cat(otpt, sep = "\n\n")
> 			cat(tky, sep = "\n\n")
> 			quartz(boxplot(mydf[,i] ~ mydf[,1]))
> 		}
> 	}	
> }				
> sink()
> 
> The ERROR: 
> 
> Error in TukeyHSD.aov(fm1) : no factors in the fitted model
> In addition: Warning message:
> In replications(paste("~", xx), data = mf) : non-factors ignored: var2
> 
> It works if I leave out " tky <- TukeyHSD(fm1) " and the subsequent cat.
> I've tried doing the Tukey on the mydf[,1].. itself, changing their classes,
> etc. Perhaps the whole approach is flawed. 
> 
> THANKS!
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Tukey-HSD-tp4374072p4374072.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com