[R] boxplot - code for labeling outliers - any suggestions for improvements?

Greg Snow Greg.Snow at imail.org
Thu Jan 27 00:09:31 CET 2011


For the last point (cluttered text), look at spread.labels in the plotrix package and spread.labs in the TeachingDemos package (I favor the later, but could be slightly biased as well).  Doing more than what those 2 functions do becomes really complicated really fast.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Tal Galili
> Sent: Wednesday, January 26, 2011 4:05 PM
> To: r-help at r-project.org
> Subject: [R] boxplot - code for labeling outliers - any suggestions for
> improvements?
> 
> Hello all,
> I wrote a small function to add labels for outliers in a boxplot.
> This function will only work on a simple boxplot/formula command (e.g:
> something like boxplot(y~x)).
> 
> Code + example follows in this e-mail.
> 
> I'd be happy for any suggestions on how to improve this code, for
> example:
> 
>    - Handle boxplot.matrix (which shouldn't be too hard to do)
>    - Handle cases of complex functions (e.g: boxplot(y~a*b))
>    - Handle cases where there are many outliers leading to a clutter of
> text
>    (to this I have no idea how to systematically solve)
> 
> 
> Best,
> Tal
> ------------------------------
> 
> 
> # the function
> boxplot.add.outlier.text <- function(DATA, x_name, y_name, label_name)
> {
> 
> 
> boxplot.outlier.data <- function(xx, y_name)
> {
>  y <- xx[,y_name]
> boxplot_range <- range(boxplot.stats(y)$stats)
> ss <- (y < boxplot_range[1]) | (y > boxplot_range[2])
>  return(xx[ss,])
> }
> 
> require(plyr)
> txt_to_run <- paste("ddply(DATA, .(",x_name,"), boxplot.outlier.data,
> y_name
> = y_name)", sep = "")
>  ourlier_df <- eval(parse(text = txt_to_run))
> # head(ourlier_df)
>  txt_to_run <- paste("formula(",y_name,"~", x_name,")")
>  formu <- eval(parse(text = txt_to_run))
> boxdata <- boxplot(formu , data = DATA, plot = F)
>  boxdata_group_name <- boxdata$names[boxdata$group]
> boxdata_outlier_df <- data.frame(group = boxdata_group_name, y =
> boxdata$out, x = boxdata$group)
>  for(i in seq_len(dim(boxdata_outlier_df)[1]))
> {
>  ss <- (ourlier_df[,x_name]  %in% boxdata_outlier_df[i,]$group) &
> (ourlier_df[,y_name] %in% boxdata_outlier_df[i,]$y)
> current_label <- ourlier_df[ss,label_name]
>  temp_x <- boxdata_outlier_df[i,"x"]
> temp_y <- boxdata_outlier_df[i,"y"]
>  text(temp_x, temp_y, current_label,pos=4)
> }
> 
> list(boxdata_outlier_df = boxdata_outlier_df, ourlier_df=ourlier_df)
> }
> 
> # example:
> boxplot(decrease ~ treatment, data = OrchardSprays, log = "y", col =
> "bisque")
> boxplot.add.outlier.text(OrchardSprays, "treatment", "decrease",
> "colpos")
> 
> 
> 
> 
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
> |
> www.r-statistics.com (English)
> -----------------------------------------------------------------------
> -----------------------
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list