[R] mild and extreme outliers in boxplot
S Ellison
S.Ellison at lgc.co.uk
Thu Aug 20 02:41:48 CEST 2009
>The OP asked how to plot mild and extreme outliers
>with distinct plotting symbols. ...
>In fact it can't be done!
I hate to contradict (yeah, right! ;-) ) but actually the basic info
needed for distinguishing different extents of outlier is available in
the help files if you read through sufficiently to understand how
boxplot and bxp.default operate.
Certainly bxp and boxplot do not do that by themselves, and it is, er,
optimistic to expect a beginner to work out how to put it together with
nothing but the help pages. But boxplot does return the group stats and
the identified outliers per group and with plot=FALSE no plot is
generated, so it can be done with only modest ingenuity.
The trick would be to proceed in stages
# )get data
set.seed(971)
y<-rt(500, df=2)
x<-gl(10,50)
#i) run boxplot on the data with plot=FALSE; for example
bp<-boxplot(y~x, plot=FALSE)
#ii) Plot the boxplot, suppressing points with outpch=NA.
bxp(bp, outpch=NA)
#iii) plot the outliers manually using points() with whatever colours
# and pch settings you like. To be neat, write a bit of code that
# uses the bxp object info to set the colours etc. For example;
bxp.extreme<-function(z, extreme=2.5) {
#function to identify 'extreme' outliers in a boxplot
#z is a boxplot object returned by boxplot(... , plot=FALSE)
#Returns a vector of length equal to length(z$out) with TRUE where
outliers are outside
#box ends by more than extreme * (interquartile range)
boxrange <- bp$stats[4,bp$group] - bp$stats[2,bp$group]
big.outlier<- (z$out > bp$stats[4,bp$group] + extreme*boxrange) |
(z$out < bp$stats[2,bp$group] - extreme*boxrange)
return(big.outlier)
}
ext<-bxp.extreme(bp)
points(bp$group, bp$out, pch=ifelse(ext, 2,1), col=ifelse(ext, 2,1))
legend("bottomright", bty="0", legend=c("1.5 to 2.5 IQR", "> 2.5 IQR"),
pch=c(1,2), col=c(1,2))
#iv) answer all the emails from R help asking why you would want to do
that... ;-)
So it is there... kind of... and the hacking is only about three lines
of function code which could be done on the command line for a one-off.
Steve E
*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
More information about the R-help
mailing list