[Rd] ppoints (PR#7538)

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Jan 24 18:40:31 CET 2005


On Mon, 24 Jan 2005, Tobias Verbeke wrote:

> On Mon, 24 Jan 2005 09:37:44 +0000 (GMT)
> Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
>
>> On Wed, 19 Jan 2005 tobias.verbeke at telenet.be wrote:
>>
>>> Dear r-bugs,
>>>
>>> Whilst playing with ppoints I discovered
>>> that when one uses it directly, occasional
>>> NA's in a vector also become data fractions:
>>>
>>> ppoints(c(1,2,NA,4))
>>>
>>> Would it be a good idea to add a warning message
>>> as in:
>>>
>>> ppoints <- function (n, a = ifelse(n <= 10, 3/8, 1/2))
>>> {
>>>    if(any(is.na(n))) warning("'n' contains NA's")
>>>    if(length(n) > 1) n <- length(n)
>>>    if(n > 0)
>>>        (1:n - a)/(n + 1-2*a)
>>>    else numeric(0)
>>> }
>>
>> Why?  There are 4 points in your vector, and the result is perfectly
>> valid as documented, even if they were all NAs.
>
> When using ppoints in order to draw a quantile plot to have a first look
> at a distribution, I almost forgot (read: I did) to remove the NAs.

Wait a minute: you removed the NAs when you called sort. So

yco <- sort(Stamford)
xco <- ppoints(yco)

was probably what you intended.


> For example, Chambers, Cleveland et al. (1983), Graphical Methods
> for Data Analysis, p. 15 Fig. 2.4:
>
> "Stamford" <-
> c(66, 52, NA, NA, NA, NA, 49, 64, 68, 26, 86, 52, 43, 75, 87,
> 188, 118, 103, 82, 71, 103, 240, 31, 40, 47, 51, 31, 47, 14,
> NA, 71, 61, 47, NA, 196, 131, 173, 37, 47, 215, 230, NA, 69,
> 98, 125, 94, 72, 72, 125, 143, 192, NA, 122, 32, 114, 32, 23,
> 71, 38, 136, 169, 152, 201, 134, 206, 92, 101, 119, 124, 133,
> 83, NA, 60, 124, 142, 124, 64, 75, 103, NA, 46, 68, NA, 87, 27,
> NA, 73, 59, 119, 64, NA, 111, 80, 68, 24, 24, 82, 100, 55, 91,
> 87, 64, NA, NA, 170, NA, 86, 202, 71, 85, 122, 155, 80, 71, 28,
> 212, 80, 24, 80, 169, 174, 141, 202, 113, 38, 38, 28, 52, 14,
> 38, 94, 89, 99, 150, 146, 113, 38, 66, 38, 80, 80, 99, 71, 42,
> 52, 33, 38, 24, 61, 108, 38, 28, NA)
>
> xco <- ppoints(na.omit(Stamford))
> yco <- sort(Stamford)
> plot(xco, yco,
>     pch = 20,
>     xlab = "FRACTION OF DATA",
>     ylab = "QUANTILES OF OZONE DATA",
>     cex = 0.6)
>
>
>>> Another minor remark concerning ?ppoints. It says:
>>>
>>> n: either the number of points generate or a vector of
>>>          observations.     ^^^^^
>>
>> As you see, that does not line up, but the typo has been fixed.
>
> Thank you for your answer (and fix).
> Tobias
>
>> --
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list