[R] mild and extreme outliers in boxplot

Gavin Simpson gavin.simpson at ucl.ac.uk
Thu Aug 20 00:53:22 CEST 2009


On Thu, 2009-08-20 at 09:58 +1200, Rolf Turner wrote:
> On 20/08/2009, at 9:39 AM, Gavin Simpson wrote:
> 
> 	<snip>
> 
> > Criticising correct, if cryptic or highlevel, responses to a list  
> > where
> > people give their time for free, *and* not provide a more complete
> > solution is unfair, Rolf. The OP is free to respond and ask for
> > additional help once they've given it a go if they are still having
> > trouble..
> 
> 	When the ``correct response'' is seriously misleading, as
> 	this one was --- the implication of the response was that
> 	the specified task *could* be done (if one looked hard
> 	enough at the help files), when in fact the specified task
> 	can't be done (at least not without substantial hacking)

Hardly "substantial hacking" Rolf, and somewhat educational in regards
of the underlying functions used by boxplot. My suggestion is 9 lines of
code, and it only stretches to 9 because I did each step in turn to make
it easier to understand/explain.

> 	--- then I think criticism is merited.
> 
> 	Also when a clear answer (``It can't be done.'') is as easy to
> 	give as an obscurantist misleading one (``RTFM'') then criticism
> 	is merited.

Sorry Rolf, but "it can't be done" is somewhat subjective. All one is
doing is plotting a character on a graphics device at a certain
location, with the actual character determined on the basis of some a
priori determined indicator of "extreme" outlyingness. I showed how it
*could* be done, by manipulating the 'coef' argument of boxplot.sats(),
which works if you can couch your definition of "extreme" in terms of
the box height.

## install.package("fortunes")
require("fortunes")
fortune("this is R")

:-)

This is not to say that I necessarily think one should do this, but the
author of boxplot.stats must have envisaged a situation where you might
want to alter the definition of "outlier" (not that that is the right
word in this case as these observations are potentially just extreme,
not necessarily outliers). After all, all we are doing is determining
how far from the box centre we would like to start showing individual
observations.

I admit that RTFM wasn't that helpful for a newbie ( said as much), but
replying with "it can't be done" is just as useless if not more so. In
this case one can do it if one has some definition of "extreme" that
allows you to determine which points, if any, to draw. Showing how that
can be done but wrapping it in suggestive language that this might not
be a "Good Idea" (TM) is better than your suggested response.

G

> 
> 	There is a difference between saying RTFM to a poster who has
> 	clearly been too lazy to do his or her homework and saying RTFM
> 	to a poster when TFM is not at all clear with respect to the
> 	question posed.  There are so many arguments to bxp() that anyone
> 	might be forgiven for thinking ``There must be a way to do what
> 	I want; I just haven't twigged to the correct way of putting
> 	these arguments together.''  Deliberately steering a new user
> 	into such a misapprehension is unforgivable.
> 
> 		cheers,
> 
> 			Rolf Turner
> 
> ######################################################################
> Attention: 
> This e-mail message is privileged and confidential. If you are not the 
> intended recipient please delete the message and notify the sender. 
> Any views or opinions presented are solely those of the author.
> 
> This e-mail has been scanned and cleared by MailMarshal 
> www.marshalsoftware.com
> ######################################################################
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%




More information about the R-help mailing list