[Rd] quantile(), IQR() and median() for factors
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Mar 6 18:36:40 CET 2009
On Fri, 6 Mar 2009, Greg Snow wrote:
> I like the idea of median and friends working on ordered factors.
> Just a couple of thoughts on possible implementations.
>
> Adding extra checks and functionality will slow down the function.
> For a single evaluation on a given dataset this slowdown will not be
> noticeable, but inside of a simulation, bootstrap, or other high
> iteration technique, it could matter. I would suggest creating a
> core function that does just the calculations (median, quantile,
> iqr) assuming that the data passed in is correct without doing any
> checks or anything fancy. Then the user callable function (median
> et. al.) would do the checks dispatch to other functions for
> anything fancy, etc. then call the core function with the clean
> data. The common user would not really notice a difference, but
> someone programming a high iteration technique could clean the data
> themselves, then call the core function directly bypassing the
> checks/branches.
Since median and quantile are already generic, adding a 'ordered'
method would be zero cost to other uses. And the factor check at the
head of median.default could be replaced by median.factor if someone
could show a convincing performance difference.
> Just out of curiosity (from someone who only learned from English
> (Americanized at that) and not Italian texts), what would the median
> of [Low, Low, Medium, High] be?
I don't think it is 'the' median but 'a' median. (Even English
Wikipedia says the median is not unique for even numbers of inputs.)
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-
>> project.org] On Behalf Of Simone Giannerini
>> Sent: Thursday, March 05, 2009 4:49 PM
>> To: R-devel
>> Subject: [Rd] quantile(), IQR() and median() for factors
>>
>> Dear all,
>>
>> from the help page of quantile:
>>
>> "x numeric vectors whose sample quantiles are wanted. Missing
>> values are ignored."
>>
>> from the help page of IQR:
>>
>> "x a numeric vector."
>>
>> as a matter of facts it seems that both quantile() and IQR() do not
>> check for the presence of a numeric input.
>> See the following:
>>
>> set.seed(11)
>> x <- rbinom(n=11,size=2,prob=.5)
>> x <- factor(x,ordered=TRUE)
>> x
>> [1] 1 0 1 0 0 2 0 1 2 0 0
>> Levels: 0 < 1 < 2
>>
>>> quantile(x)
>> 0% 25% 50% 75% 100%
>> 0 <NA> 0 <NA> 2
>> Levels: 0 < 1 < 2
>> Warning messages:
>> 1: In Ops.ordered((1 - h), qs[i]) :
>> '*' is not meaningful for ordered factors
>> 2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered
>> factors
>>
>>> IQR(x)
>> [1] 1
>>
>> whereas median has the check:
>>
>>> median(x)
>> Error in median.default(x) : need numeric data
>>
>> I also take the opportunity to ask your comments on the following
>> related subject:
>>
>> In my opinion it would be convenient that median() and the like
>> (quantile(), IQR()) be implemented for ordered factors for which in
>> fact
>> they can be well defined. For instance, in this way functions like
>> apply(x,FUN=median,...) could be used without the need of further
>> processing for
>> data frames that contain both numeric variables and ordered factors.
>> If on the one hand, to my limited knowledge, in English introductory
>> statistics
>> textbooks the fact that the median is well defined for ordered
>> categorical variables is only mentioned marginally,
>> on the other hand, in the Italian Statistics literature this is often
>> discussed in detail and this could mislead students and practitioners
>> that might
>> expect median() to work for ordered factors.
>>
>> In this message
>>
>> https://stat.ethz.ch/pipermail/r-help/2003-November/042684.html
>>
>> Martin Maechler considers the possibility of doing such a job by
>> allowing for extra arguments "low" and "high" as it is done for mad().
>> I am willing to give a contribution if requested, and comments are
>> welcome.
>>
>> Thank you for the attention,
>>
>> kind regards,
>>
>> Simone
>>
>>> R.version
>> _
>> platform i386-pc-mingw32
>> arch i386
>> os mingw32
>> system i386, mingw32
>> status
>> major 2
>> minor 8.1
>> year 2008
>> month 12
>> day 22
>> svn rev 47281
>> language R
>> version.string R version 2.8.1 (2008-12-22)
>>
>> LC_COLLATE=Italian_Italy.1252;LC_CTYPE=Italian_Italy.1252;LC_MONETARY=
>> Italian_Italy.1252;LC_NUMERIC=C;LC_TIME=Italian_Italy.1252
>>
>> --
>> ______________________________________________________
>>
>> Simone Giannerini
>> Dipartimento di Scienze Statistiche "Paolo Fortunati"
>> Universita' di Bologna
>> Via delle belle arti 41 - 40126 Bologna, ITALY
>> Tel: +39 051 2098262 Fax: +39 051 232153
>> http://www2.stat.unibo.it/giannerini/
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list