[Rd] quantile() names

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Tue Dec 15 22:19:09 CET 2020


Thank you for explaining, Ed. It makes looking at the issue raised much
easier.

 

As I understand it, you are not really asking about some thing fully in your
control. You are asking how any function like quantile() should behave when
a user has altered something global or at least global within a package,
such as this:

 

> quantile(x, c(.95, .975, .99000))

    95%   97.5%     99% 

950.050 975.025 990.010 

> dig.it <- options(digits=2)

> dig.it

$digits

[1] 7

 

I did it that way so I could re-set it!

 

I looked to see if quantile() is written in base R and it seems to be a
generic that I would have to hunt down so I stopped for now.

 

Here is what I get BEFORE changing the option for digits:

 

> x <- 1:1000

> quantile(x, probs=c(.95, .975, .99000))

    95%   97.5%     99% 

950.050 975.025 990.010

 

Note I used the fuller version asking for multiple thresholds so I could see
what happened if I used more zeroes. Note that trailing zeroes are not shown
in the name of the third element of the vector. So I can suggest the program
is not getting the unevaluated text to use but is using the value of the
vector. Now I set the number of digits to 2, globally, and repeat:

 

> quantile(x, probs=c(.95, .975, .99000))

95% 98% 99% 

950 975 990

 

I notice several things as others have pointed out. There seems to be a
truncation in the values shown so nothing is now shown past the decimal
point. But maybe not as adding an argument of 1/3 gives 334 rather than 333.

 

> quantile(x, probs=c(.95, .975, .99000, 1/3))

95% 98% 99% 33% 

950 975 990 334

 

Now the names are apparently rounded as discussed, with the percent symbol
appended.

 

So what would you propose? Within the function there seem to be two parts
dealing with displaying the result and it looks like the original number
loses precision as handing the above to round(., 7) shows no change. So are
you asking it to parse the name different than the value even though there
is a global variable set specifying the digits they want?

 

If it really mattered, I suggest one solution may be to allow one or two
additional arguments to a function like quantile like:

 

quantile(x, ., digits=5, names=c("95%", "97.5%", .) )

 

So if a user really wanted to live in their own world of fewer digits they
could specify what labels they wanted and could ask for "high", "Higher" and
"HIGHEST" or whatever makes them happy. But, as noted, any user wanting that
level of control can change the labels afterward. But you are correct in
some package using quantile() and calling out the results individually by
name will not be able to consistently and reliably use that technique. But
can they use it now? I tried using variations on $.95% such as this and they
fail such as for quantile(x, c(.95, .975, .99000))$`95%` and the same for
using [] notation. These identifiers were not chosen to be used this way.
You can get them positionally:

 

> quantile(x, c(.95, .975, .99000))[1]

95% 

950 

> quantile(x, c(.95, .975, .99000))[2]

98% 

975

 

If you convert the darn out put from a vector to a list, though, it works,
using grave accents:

 

> as.list(quantile(x, c(.95, .975, .99000)))$`98%`

[1] 975

 

So, I doubt many would play games like me to find some way to select by
name. Odds are they might use position or get one at a time. The name is
more for humans to read, I would think.

 

 

Just my two cents. When an instruction impacts multiple places, it can be
ambiguous and changing global variables is, well, global.

 

Which raise another question here is why did the people making choices
choose silly names that are all numeric with maybe a decimal point and
ending in a character like % that has other uses? A cousin of quantile is
fivenum() that returns Tukey's five number summary as useful in making
boxplots:

 

> fivenum(x)

[1]    1  250  500  750 1000

 

This returned a vector with no names. You can only index it by number,
albeit the columns are always in a fixed order and you know what to expect
in each. Another cousin returns a more complex structure 

 

> boxplot.stats(x)

$stats

[1]    1  250  500  750 1000

 

$n

[1] 1000

 

$conf

[1] 476 525

 

$out

integer(0)

 

> boxplot.stats(x)$stats

[1]    1  250  500  750 1000

 

That is a list of items but the first item is a vector with no names that is
the same as for fivenum().

 

Would it make more sense for the column names of the output looked more
like: 

 

> temp <- quantile(x, c(.95, .975, .99000))

> names(temp) <- c("perc95", "perc98", "perc99")

> temp

perc95 perc98 perc99 

   950    975    990

 

So you could do this to a vector:

 

> temp["perc98"]

perc98 

   975

Or do even more to a list:

 

> as.list(temp)$perc98

[1] 975

 

My feeling is some things are not really bugs but more like FEATURES you
normally live with and if it matters, work around it. I had trouble a while
ago with a laavan() case I ran where very rarely the program simply broke.
When in a big loop running hundreds of thousands of times, that messed up as
the program as a whole just stopped. So, I wrapped it and other parts in
variations of try() to bulletproof it and lived with it. Sure, it slowed
down a bit but it ran for hours or days so why fight it to find a subtle bug
in something I could not change. Your question is valid but my guess is few
use it in a way that will get much notice.

 

 

 

From: Ed Merkle <ecmerkle using gmail.com> 
Sent: Tuesday, December 15, 2020 11:33 AM
To: Avi Gross <avigross using verizon.net>; r-devel using r-project.org
Subject: Re: [Rd] quantile() names

 

Avi,

 

On Mon, 2020-12-14 at 18:00 -0500, Avi Gross wrote:

Question: is the part that Ed Merkle is asking about the change in the
expected NAME associated with the output?

 

The question is indeed about the name changing to "98%", when the returned
object is the 97.5th percentile.

 

It is indeed easy to set names=FALSE here. But there can still be a problem
when the user sets options(digits=2), then a package calls quantile(x, .975)
and expects an object that has a name of "97.5%".

 

I think the easiest solution is to tell the user not to set
options(digits=2), but it also seems like the "98%" name is not the best
result. But Gabriel is correct that we would still need to consider how to
handle something like quantile(x, 1/3). Maybe it is not a big enough issue
to warrant changing anything.

 

Ed

 

 

  _____  


 
<https://home.mcafee.com/utm_medium=email&utm_source=link&utm_campaign=sig-e
mail&utm_content=emailclient?utm_medium=email&utm_source=link&utm_campaign=s
ig-email&utm_content=emailclient> 

Scanned by McAfee
<https://home.mcafee.com/utm_medium=email&utm_source=link&utm_campaign=sig-e
mail&utm_content=emailclient?utm_medium=email&utm_source=link&utm_campaign=s
ig-email&utm_content=emailclient>  and confirmed virus-free.

 


	[[alternative HTML version deleted]]



More information about the R-devel mailing list