[R] mild and extreme outliers in boxplot

Gavin Simpson gavin.simpson at ucl.ac.uk
Wed Aug 19 23:39:10 CEST 2009


On Wed, 2009-08-19 at 13:49 -0700, Bert Gunter wrote:
> Rolf:
> 
> Not sure what "reasonably thorough" means but:
> 
>  ? boxplot says:

Exactly Bert, the info is there is you want to look and do so hard
enough. However, it is perhaps expecting quite a lot of a new useR to
put this together from ?boxplot or ?bxp, and ?boxplot.stats.

Criticising correct, if cryptic or highlevel, responses to a list where
people give their time for free, *and* not provide a more complete
solution is unfair, Rolf. The OP is free to respond and ask for
additional help once they've given it a go if they are still having
trouble..

One solution, if you are prepared to bastardise the standard
interpretation of the boxplot, is to compute the relevant boxplot
statistics using boxplot.stats and alter argument 'coef' to some larger
multiple of the box height to represent "extreme" outliers, whatever
those might be. So here's the rope, try not to hang yourself 'Rnewbie'!

set.seed(1234)
dat <- rt(100, df = 2)
bxp1 <- boxplot.stats(dat)
bxp2 <- boxplot.stats(dat, coef = 2)

##Then you'd need to plot the boxplot without outliers

boxplot(dat, outpch = NA)

##Then plot the points 1.5-2 x box height

want <- bxp1$out %in% bxp2$out
out <- bxp1$out
out[want] <- NA

points(rep(1, length(out)), out, pch = 1, col = "blue")

##Then the further outliers

outout <- bxp2$out
points(rep(1, length(outout)), outout, pch = 2, col = "red")

How one decides what is an outlier or an extreme outlier is another
matter...? By chance the dummy data here shows one problem; there isn't
much difference between 'outliers' and 'extreme outliers' towards the
bottom of the resulting plot so why should we distinguish them?

(By the way 'Rnewbie', this isn't something I recommend you do, but you
might know more about your real world use case than I.)

HTH

G

Ps; is there a reason why you post anonymously, 'Rnewbie'? Do you not
want us to know who you are, but want our help?

> 
> ...
> pars    a list of (potentially many) more graphical parameters, e.g., boxwex
> or outpch; these are passed to bxp (if plot is true); for details, see
> there.
> 
> 
> Well, that seems pretty clear to me, so I went to ?bxp to find in the pars
> listing:
> 
> outlty, outlwd, outpch, outcex, outcol, outbg:
> outlier line type, line width, point character, point size expansion, color,
> and background color. The default outlty= "blank" suppresses the lines and
> outpch=NA suppresses points.
> 
> 
> It seems to me that this (and other omitted excerpts + examples) is at least
> a reasonable answer to the query (allowing the reader to at least infer that
> bxp does not distinguish degrees of outlyingness), so I don't understand
> your criticism. Feel free to respond privately if you prefer.
> 
> -- Bert
> 
> Bert Gunter
> Genentech Nonclinical Biostatisics
> 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of Rolf Turner
> Sent: Wednesday, August 19, 2009 1:27 PM
> To: ottorino-luca.pantani at unifi.it
> Cc: Rnewbie; ERRE
> Subject: Re: [R] mild and extreme outliers in boxplot
> 
> 
> On 20/08/2009, at 3:13 AM, Ottorino-Luca Pantani wrote:
> 
> > Rnewbie ha scritto:
> >> dear all,
> >>
> >> could somebody tell me how I can plot mild outliers as a circle(°)  
> >> and
> >> extreme outliers as an asterisk(*) in a box-whisker plot?
> >>
> >> Thanks very much in advance
> >>
> > ?boxplot
> >
> > or
> >
> > help(bxp)
> 
> This is the sort of response that gives R-help a bad name.
> 
> I had a reasonably thorough look at these help files and saw  
> ***nothing***
> that would answer the OP's question.  The information may be there  
> --- I'm
> not sure about this --- but it is far from obvious.  Explicit reference
> to the appropriate lines of the help file(s) would be useful.
> 
> 	cheers,
> 
> 		Rolf Turner
> ######################################################################
> Attention: 
> This e-mail message is privileged and confidential. If you are not the 
> intended recipient please delete the message and notify the sender. 
> Any views or opinions presented are solely those of the author.
> 
> This e-mail has been scanned and cleared by MailMarshal 
> www.marshalsoftware.com
> ######################################################################
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%




More information about the R-help mailing list