[R] boxplot - code for labeling outliers - any suggestions for improvements?

Kevin Wright kw.stat at gmail.com
Thu Jan 27 18:27:06 CET 2011


My colleagues that use one of the .Net languages/libraries can make
scatter plots that look better than R's because they have better
spreading of the labels.

If someone could spread this labels on the following graph, I would be
impressed.

plot(Sepal.Length~Sepal.Width, data=iris)
with(iris,text(Sepal.Width, Sepal.Length, 1:nrow(iris), cex=.5))

Kevin


On Thu, Jan 27, 2011 at 9:52 AM, Tal Galili <tal.galili at gmail.com> wrote:
> Thanks again for the pointer to spread.labs Greg.
>
> I implemented it into the function and also extended it to deal with
> formulas so it could behave just like boxplot.
> Code and examples are available here:
> http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/
>
> I'd be happy for any suggestions on how to improve it.
>
> Best,
> Tal
>
>
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> ----------------------------------------------------------------------------------------------
>
>
>
>
> On Thu, Jan 27, 2011 at 1:09 AM, Greg Snow <Greg.Snow at imail.org> wrote:
>
>> For the last point (cluttered text), look at spread.labels in the plotrix
>> package and spread.labs in the TeachingDemos package (I favor the later, but
>> could be slightly biased as well).  Doing more than what those 2 functions
>> do becomes really complicated really fast.
>>
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> Statistical Data Center
>> Intermountain Healthcare
>> greg.snow at imail.org
>> 801.408.8111
>>
>>
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> > project.org] On Behalf Of Tal Galili
>> > Sent: Wednesday, January 26, 2011 4:05 PM
>> > To: r-help at r-project.org
>> > Subject: [R] boxplot - code for labeling outliers - any suggestions for
>> > improvements?
>> >
>> > Hello all,
>> > I wrote a small function to add labels for outliers in a boxplot.
>> > This function will only work on a simple boxplot/formula command (e.g:
>> > something like boxplot(y~x)).
>> >
>> > Code + example follows in this e-mail.
>> >
>> > I'd be happy for any suggestions on how to improve this code, for
>> > example:
>> >
>> >    - Handle boxplot.matrix (which shouldn't be too hard to do)
>> >    - Handle cases of complex functions (e.g: boxplot(y~a*b))
>> >    - Handle cases where there are many outliers leading to a clutter of
>> > text
>> >    (to this I have no idea how to systematically solve)
>> >
>> >
>> > Best,
>> > Tal
>> > ------------------------------
>> >
>> >
>> > # the function
>> > boxplot.add.outlier.text <- function(DATA, x_name, y_name, label_name)
>> > {
>> >
>> >
>> > boxplot.outlier.data <- function(xx, y_name)
>> > {
>> >  y <- xx[,y_name]
>> > boxplot_range <- range(boxplot.stats(y)$stats)
>> > ss <- (y < boxplot_range[1]) | (y > boxplot_range[2])
>> >  return(xx[ss,])
>> > }
>> >
>> > require(plyr)
>> > txt_to_run <- paste("ddply(DATA, .(",x_name,"), boxplot.outlier.data,
>> > y_name
>> > = y_name)", sep = "")
>> >  ourlier_df <- eval(parse(text = txt_to_run))
>> > # head(ourlier_df)
>> >  txt_to_run <- paste("formula(",y_name,"~", x_name,")")
>> >  formu <- eval(parse(text = txt_to_run))
>> > boxdata <- boxplot(formu , data = DATA, plot = F)
>> >  boxdata_group_name <- boxdata$names[boxdata$group]
>> > boxdata_outlier_df <- data.frame(group = boxdata_group_name, y =
>> > boxdata$out, x = boxdata$group)
>> >  for(i in seq_len(dim(boxdata_outlier_df)[1]))
>> > {
>> >  ss <- (ourlier_df[,x_name]  %in% boxdata_outlier_df[i,]$group) &
>> > (ourlier_df[,y_name] %in% boxdata_outlier_df[i,]$y)
>> > current_label <- ourlier_df[ss,label_name]
>> >  temp_x <- boxdata_outlier_df[i,"x"]
>> > temp_y <- boxdata_outlier_df[i,"y"]
>> >  text(temp_x, temp_y, current_label,pos=4)
>> > }
>> >
>> > list(boxdata_outlier_df = boxdata_outlier_df, ourlier_df=ourlier_df)
>> > }
>> >
>> > # example:
>> > boxplot(decrease ~ treatment, data = OrchardSprays, log = "y", col =
>> > "bisque")
>> > boxplot.add.outlier.text(OrchardSprays, "treatment", "decrease",
>> > "colpos")
>> >
>> >
>> >
>> >
>> > ----------------Contact
>> > Details:-------------------------------------------------------
>> > Contact me: Tal.Galili at gmail.com |  972-52-7275845
>> > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
>> > |
>> > www.r-statistics.com (English)
>> > -----------------------------------------------------------------------
>> > -----------------------
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-
>> > guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Kevin Wright



More information about the R-help mailing list