[R] accessing the indices of outliers in a data frame boxplot
Chuck Cleland
ccleland at optonline.net
Fri Jan 25 18:01:14 CET 2008
On 1/25/2008 11:39 AM, Karin Lagesen wrote:
> I have a data frame containing columns which are factors. I use this
> to make boxplots for the data, with one box per factor. I would now
> like to get at the data in the data frame which corresponds to the
> outliers. I have so far found the $out, which gives "the values of any
> data points which lie beyond the extremes of the whiskers", but I
> haven't found anything which will let me get at the indices in the
> original data frame for these outliers.
>
> I think there might be a chance that I could simply compare the values
> I am plotting from my data frame with the values for the whiskers and
> use that as a criteria, but I am unsertain of how to do this withhout
> doing it manually. The factor I am plotting against contains 17
> levels, and I'd thus like to see if there is a somewhat more general
> solution available.
>
> Thanks for your help!
>
> Karin
You can use the %in% operator (is.element) to see which data values
in your data frame match an outlier value. Then use which() to return
the TRUE indices. For example:
set.seed(245)
df <- data.frame(GRP = rep(LETTERS[1:4], each=25), Y = rchisq(100, 2))
mybp <- boxplot(Y ~ GRP, data=df)
which(df$Y %in% mybp$out)
[1] 8 12 47 66 88 93
mybp$out
[1] 5.919915 9.135578 5.723714 8.758584 8.502147 4.920513
df$Y[which(df$Y %in% mybp$out)]
[1] 5.919915 9.135578 5.723714 8.758584 8.502147 4.920513
See ?is.element and ?which.
--
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
More information about the R-help
mailing list