Fantastic!
It would be great if the description could be modified to include the mysterious bit about the upper and lower bound whisker positions:
upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)
Maybe that is clearly written in the description of boxplot.stats {grDevices}, but evidently I missed it numerous times and also did not pick up on this intent from the original description of boxplot {graphics}.
Your type of descriptive answer and helpfulness is much appreciated and one of the reasons I continue to endorse the R tool over numerous others.
More like you and the tool may be headed for domination in the market.
Thanks again!
________________________________
From: Dennis Murphy
Cc: R Project Help
Sent: Wed, May 12, 2010 2:50:19 AM
Subject: Re: [R] Whiskers on the default boxplot {graphics}
Hi:
Let's do some math :)
e:
Okay...Let me see if I've got it...
>
>>I'm just trying to use the default boxplot {graphics} capability in R...
>
>>So I call something like the following:
>>> boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of Cylinders", ylab="Miles Per Gallon") \
>
>>That produces something as shown in the following:
>http://www.statmethods.net/graphs/images/boxplot1.jpg
>
>>When that default boxplot is called, i.e. boxplot {graphics}, as shown in the line of code above, it is actually calling into boxplot.stats {grDevices}. When boxplot.stats {grDevices} is called it has a default value for "coef" of 1.5, i.e. coef = 1.5.
>
>>If I understand the purpose of "coef" correctly, it means that the ‘whiskers’ should extend out 1.5 times the length of the box away from the box. Is that correct?
>
If by 'length of the box' you mean the interquartile range (IQR = Q_3 - Q_1 where Q refers to quartile), then assuming that
x is the numeric vector of interest for a boxplot,
upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 - 1.5 * IQR)
So the upper whisker is located at the *smaller* of the maximum x value and Q_3 + 1.5 IQR,
whereas the lower whisker is located at the *larger* of the smallest x value and Q_1 - 1.5 IQR.
In your terms, the whiskers should extend out a *maximum* of "1.5 times the length of the box
away from the box".
Visually, this means that individual points more extreme in value than Q3 + 1.5 IQR are plotted
separately at the high end, and those below Q1 - 1.5 IQR are plotted separately on the low
end. Depending on the source, the separately plotted points are called 'outside values'. On
the other hand, if the maximum or minimum values of x are closer than 1.5 IQR in distance from
its nearest quartile, then that is where the whisker is positioned.
Does that make sense?
HTH,
Dennis
>>Now I look back at the plot, and I'm not sure how 1.5 times the length of the box corresponds with the whisker lengths shown in the image:
>http://www.statmethods.net/graphs/images/boxplot1.jpg
>
>>Is it that the whisker length is a total of 1.5 the length of the box and centered about the median (2nd Quartile)?
>
>>Just trying to get a handle on this, so thanks again for all the help in deciphering this.
>
>
>
>
>
>
>
>>________________________________
>>From: RJ Cunningham
>
>ast.net>
>>Cc: R Project Help
>>Sent: Tue, May 11, 2010 9:57:48 PM
>
>Subject: Re: [R] Whiskers on the default boxplot {graphics}
>
>
>I think not. Isn't the "secret" here?
>
>
>>Arguments:
>
>>x: a numeric vector for which the boxplot will be constructed
>>('NA's and 'NaN's are allowed and omitted).
>
>>coef: this determines how far the plot 'whiskers' extend out
>>from the box. If 'coef' is positive, the whiskers extend
>>to the most extreme data point which is no more than
>>'coef' times the length of the box away from the box. A
>>value of zero causes the whiskers to extend to the data
>>extremes (and no outliers be returned).
>
>>do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out'
>>component respectively will be empty in the result.
>
>>Details:
>
>>The two 'hinges' are versions of the first and third quartile,...
>
>
>>On Wed May 12 10:35 , Jason Rupert sent:
>
>
>>Humm....Maybe I need to look some place else than boxplot.stats {grDevices} for a definition of how the upper/lower whiskers are produced.
>>>
>>>>
>>>By any chance are they "the lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile"?
>>>
>>>>
>>>None of the links from boxplot.stats {grDevices} seemed to reveal the secret definition of the R whiskers.
>>>
>>>>
>>>Thanks again.
>>>
>>>
>>>
>>>
>>>
>>>>
>>>----- Original Message ----
>>>>
>
>>>>
>
>>To: David Winsemius
>>>>
>>>Cc: R Project Help
>>>>
>>>Sent: Tue, May 11, 2010 9:26:25 PM
>>>>
>>>Subject: Re: [R] Whiskers on the default boxplot {graphics}
>>>
>>>>
>>>Wowzers...
>>>
>>>>
>>>From ?boxplot.stats:
>>>
>>>>
>>>Details
>>>
>>>>
>>The two ‘hinges’ are versions of the first and third quartile, i.e., close to quantile(x, c(1,3)/4). The hinges equal the quartiles for odd n (where n <- length(x)) and differ for even n. Whereas the quartiles only equal observations for n %% 4 == 1 (n = 1 mod 4), the hinges do so additionally for n %% 4 == 2 (n = 2 mod 4), and are in the middle of two observations otherwise.
>
>>
>>>>
>>>The notches (if requested) extend to +/-1.58 IQR/sqrt(n). This seems to be based on the same calculations as the formula with 1.57 in Chambers et al. (1983, p. 62), given in McGill et al. (1978, p. 16). They are based on asymptotic normality of the median and roughly equal sample sizes for the two medians being compared, and are said to be rather insensitive to the underlying distributions of the samples. The idea appears to be to give roughly a 95% confidence interval for the difference in two medians.
>>
>>
>>>
>>>
>>>>
>>>Is a notch equal to the upper/lower whisker? Is this just a difference of terminology or something?
>>>
>>>>
>>>Thanks again for all the insights.
>>>
>>>
>>>
>>>
>>>>
>>>----- Original Message ----
>>>>
>
>>From: David Winsemius
>>>>
>
>>>>
>>>Cc: R Project Help
>>>>
>>>Sent: Tue, May 11, 2010 9:00:15 PM
>>>>
>>>Subject: Re: [R] Whiskers on the default boxplot {graphics}
>>>
>>>
>>>>
>>>On May 11, 2010, at 9:45 PM, Jason Rupert wrote:
>>>
>>>>
>>>> How are the lower/upper whiskers defined in the default version of boxplot {graphics}?
>>>>
>>>>
>>>>
>>> I tried help(boxplot) and searching www.rseek.org, but I was unable to determine an absolute answer.
>
>>
>>>>
>>>You need to follow the links from the help pages and tin this case it appears that you did not follow the one to
>>>
>>>>
>>>?boxplot.stats
>>>
>>>>
>>>>
>>>>
>>> I checked out the definition of boxplot according to Wikipedia (http://en.wikipedia.org/wiki/Box_plot%5C), but it also had several approaches
>
>>>
>>>> listed for how the whiskers could be determined, so I'm just curious how the default
>>>>
>>>> boxplot {graphics} does it.
>>>>
>>>>
>>>>
>>>> Thanks for any feedback
>>>
>>>>
>>>Follow links with the R help system.
>>>
>>>>
>>>> and insights.
>>>
>>>
>>>
>>>>
>>>David Winsemius, MD
>>>>
>>>West Hartford, CT
>>>
>>>
>>>
>>>
>>>>
>>>______________________________________________
>
>>R-help@r-project.org mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>
>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>
>>>and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>>
>>>
>>>>
>>>______________________________________________
>>>R-help@r-project.org mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>
>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>
>>>and provide commented, minimal, self-contained, reproducible code.
>>>
>
>
>
>
>
[[alternative HTML version deleted]]
>
>
>______________________________________________
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>
[[alternative HTML version deleted]]