[R] mild and extreme outliers in boxplot

Rolf Turner r.turner at auckland.ac.nz
Thu Aug 20 02:39:46 CEST 2009


On 20/08/2009, at 11:29 AM, Gabor Grothendieck wrote:

> I agree that the ?boxplot answer was reasonable and appropriate

	This must be some use of the words ``reasonable'' and
	``appropriate'' with which I am not familiar since the
	answer was seriously misleading and would have had the
	OP tearing his hair out reading and re-reading the help
	files and wondering what he was missing.

		cheers,

			Rolf Turner

One should not aim at being possible to understand, but at
being impossible to misunderstand.

         --- Quintillion

> and
> there is nothing to prevent someone else from spending more time
> on an even more detailed answer or different answer if they wish.
> Many questions to r-help are answered by multiple people and the  
> different
> approaches can be interesting to compare.
>
> On Wed, Aug 19, 2009 at 5:39 PM, Gavin  
> Simpson<gavin.simpson at ucl.ac.uk> wrote:
>> On Wed, 2009-08-19 at 13:49 -0700, Bert Gunter wrote:
>>> Rolf:
>>>
>>> Not sure what "reasonably thorough" means but:
>>>
>>>  ? boxplot says:
>>
>> Exactly Bert, the info is there is you want to look and do so hard
>> enough. However, it is perhaps expecting quite a lot of a new useR to
>> put this together from ?boxplot or ?bxp, and ?boxplot.stats.
>>
>> Criticising correct, if cryptic or highlevel, responses to a list  
>> where
>> people give their time for free, *and* not provide a more complete
>> solution is unfair, Rolf. The OP is free to respond and ask for
>> additional help once they've given it a go if they are still having
>> trouble..
>>
>> One solution, if you are prepared to bastardise the standard
>> interpretation of the boxplot, is to compute the relevant boxplot
>> statistics using boxplot.stats and alter argument 'coef' to some  
>> larger
>> multiple of the box height to represent "extreme" outliers, whatever
>> those might be. So here's the rope, try not to hang yourself  
>> 'Rnewbie'!
>>
>> set.seed(1234)
>> dat <- rt(100, df = 2)
>> bxp1 <- boxplot.stats(dat)
>> bxp2 <- boxplot.stats(dat, coef = 2)
>>
>> ##Then you'd need to plot the boxplot without outliers
>>
>> boxplot(dat, outpch = NA)
>>
>> ##Then plot the points 1.5-2 x box height
>>
>> want <- bxp1$out %in% bxp2$out
>> out <- bxp1$out
>> out[want] <- NA
>>
>> points(rep(1, length(out)), out, pch = 1, col = "blue")
>>
>> ##Then the further outliers
>>
>> outout <- bxp2$out
>> points(rep(1, length(outout)), outout, pch = 2, col = "red")
>>
>> How one decides what is an outlier or an extreme outlier is another
>> matter...? By chance the dummy data here shows one problem; there  
>> isn't
>> much difference between 'outliers' and 'extreme outliers' towards the
>> bottom of the resulting plot so why should we distinguish them?
>>
>> (By the way 'Rnewbie', this isn't something I recommend you do,  
>> but you
>> might know more about your real world use case than I.)
>>
>> HTH
>>
>> G
>>
>> Ps; is there a reason why you post anonymously, 'Rnewbie'? Do you not
>> want us to know who you are, but want our help?
>>
>>>
>>> ...
>>> pars    a list of (potentially many) more graphical parameters,  
>>> e.g., boxwex
>>> or outpch; these are passed to bxp (if plot is true); for  
>>> details, see
>>> there.
>>>
>>>
>>> Well, that seems pretty clear to me, so I went to ?bxp to find in  
>>> the pars
>>> listing:
>>>
>>> outlty, outlwd, outpch, outcex, outcol, outbg:
>>> outlier line type, line width, point character, point size  
>>> expansion, color,
>>> and background color. The default outlty= "blank" suppresses the  
>>> lines and
>>> outpch=NA suppresses points.
>>>
>>>
>>> It seems to me that this (and other omitted excerpts + examples)  
>>> is at least
>>> a reasonable answer to the query (allowing the reader to at least  
>>> infer that
>>> bxp does not distinguish degrees of outlyingness), so I don't  
>>> understand
>>> your criticism. Feel free to respond privately if you prefer.
>>>
>>> -- Bert
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatisics
>>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- 
>>> project.org] On
>>> Behalf Of Rolf Turner
>>> Sent: Wednesday, August 19, 2009 1:27 PM
>>> To: ottorino-luca.pantani at unifi.it
>>> Cc: Rnewbie; ERRE
>>> Subject: Re: [R] mild and extreme outliers in boxplot
>>>
>>>
>>> On 20/08/2009, at 3:13 AM, Ottorino-Luca Pantani wrote:
>>>
>>>> Rnewbie ha scritto:
>>>>> dear all,
>>>>>
>>>>> could somebody tell me how I can plot mild outliers as a circle(°)
>>>>> and
>>>>> extreme outliers as an asterisk(*) in a box-whisker plot?
>>>>>
>>>>> Thanks very much in advance
>>>>>
>>>> ?boxplot
>>>>
>>>> or
>>>>
>>>> help(bxp)
>>>
>>> This is the sort of response that gives R-help a bad name.
>>>
>>> I had a reasonably thorough look at these help files and saw
>>> ***nothing***
>>> that would answer the OP's question.  The information may be there
>>> --- I'm
>>> not sure about this --- but it is far from obvious.  Explicit  
>>> reference
>>> to the appropriate lines of the help file(s) would be useful.
>>>
>>>       cheers,
>>>
>>>               Rolf Turner
>>> #################################################################### 
>>> ##
>>> Attention:
>>> This e-mail message is privileged and confidential. If you are  
>>> not the
>>> intended recipient please delete the message and notify the sender.
>>> Any views or opinions presented are solely those of the author.
>>>
>>> This e-mail has been scanned and cleared by MailMarshal
>>> www.marshalsoftware.com
>>> #################################################################### 
>>> ##
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> --
>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

######################################################################
Attention: 
This e-mail message is privileged and confidential. If you are not the 
intended recipient please delete the message and notify the sender. 
Any views or opinions presented are solely those of the author.

This e-mail has been scanned and cleared by MailMarshal 
www.marshalsoftware.com
######################################################################




More information about the R-help mailing list