[R] outlier labels incorrectly assigned with ggplot2 box plot
Andreas Nord
andreas.nord at biol.lu.se
Thu Oct 20 12:36:27 CEST 2016
Dear list,
I want to label outliers in a ggplot box plot with the name of the subject for which outlying data were observed.
I have proceeded by creating a simple function to identify outliers:
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
And then the 'safe_ifelse' workaround to get 'ifelse' to function properly with factors.
safe.ifelse <- function(cond, yes, no) {
class.y <- class(yes)
if (class.y == "factor") {
levels.y = levels(yes)
}
X <- ifelse(cond,yes,no)
if (class.y == "factor") {
X = as.factor(X)
levels(X) = levels.y
} else {
class(X) <- class.y
}
return(X)
}
>From here, I have ran data through a dplyr pipeline to produce the plot.
**data at https://www.dropbox.com/s/2pcuuclxiqw1va1/data.csv?dl=0
library(dplyr)
data<-subset(data,data$variable1!='NA')
p1<-
data %>%
group_by(season,location) %>%
mutate(outlier=safe.ifelse(is_outlier(variable1),subject,as.numeric(NA))) %>%
ggplot(aes(x=factor(season),y=variable1))+
geom_boxplot()+
facet_wrap(~location,nrow=2)+
guides(fill=FALSE)+
geom_text(aes(label=outlier),na.rm=TRUE,hjust=1.5,size=2.5)
While outliers are correctly identified, labelling does not work as it should. Rather than getting subject-specific outlier labels, three levels of the 'subject' factor are printed repeatedly and erroneously (and seemingly randomly). Labelling outliers by their numerical values (i.e. by changing 'subject' to 'variable1' in the 'safe_ifelse function) does not cause problems.
I assume I am missing something obvious here - perhaps someone could kindly indicate where I am going wrong?
Thanks,
Andreas
[[alternative HTML version deleted]]
More information about the R-help
mailing list