[R] Help documentation of "The Studentized range Distribution"

Mon Jul 10 19:12:45 CEST 2017

Dear Peter and Jeff,

I admit I was not precise in what I really wanted. It was more a report that the documentation is confusing, than really asking for help. Still, I was only 97% sure that it was not my fault, so I remained cautious.

So, my suggestion would be to say under the header usage just as it is now:

ptukey(q, nmeans, df, nranges = 1, lower.tail = TRUE, log.p = FALSE)
qtukey(p, nmeans, df, nranges = 1, lower.tail = TRUE, log.p = FALSE)

but in the description of parameters

qnow and future: vector of quantiles
pnow and future: vector of probabilities
nmeansnow: sample size for range (same for each group) future: number of means resp. groups of which the mean is compared
dfnow: degrees of freedom for s (see below)future: degrees of freedom for the estimation of the pooled variance
(Ursula's comment: "see below" is not true, I was looking for it...)
Nranges: now: number of groups whose maximum range is considered future: ????? (nmeans*(nmeans-1)/2)??????

The description for Nranges actually was putting me on the wrong foot as to me the description really sounds like what is really nmeans. I still am not sure what this parameter does, as in my context it does not seem to be necessary.

The example I worked through is in context of post-hoc testing in an one way balanced ANOVA and for all pairwise comparisons

With 5 groups, and n=10 subjects per group, and a significance level of a=0.95 one would use a critical value of

C = qtukey(0.95, nmeans=5, df = 5*(10-1)) /sqrt(2) = 4.01842 /sqrt(2)

(Example from SAS documentation on multiple comparison procedures)

I also checked against a tabulation in my old statistic books for more examples, and thus was pretty sure that nmeans is the number of means and nranges is something else, which you do not really specify for the post-hoc testing.

And my actual code so far is:
library(tidyverse)

g=2:10
alpha=c(0.025,0.05)
n=10:50
delta = 1
(grid <- expand.grid(alpha,g,n))

dt <- grid %>% as_tibble()
names(dt) <- c("Var1" = "alpha", "Var2" = "ngroups", "Var3" = "nIngroups")

dt <- dt %>% mutate(ctuk = 1/sqrt(2) * qtukey(1-alpha, ngroups, ngroups*(nIngroups-1)))
dt <- dt %>% mutate(cbon = qt(1-alpha/(ngroups*(ngroups-1)),ngroups*(nIngroups-1)))
dtwide <- dt %>% mutate(cnon = qt(1-alpha/2,ngroups*(nIngroups-1)))
dtlong <- dtwide %>% gather(method,cvalue,cnon,cbon,ctuk)

Hope this is better!

Kind regards,

Ursula

-----Original Message-----
From: peter dalgaard [mailto:pdalgd at gmail.com]
Sent: 10 July 2017 11:46
To: Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
Cc: R-help Mailing List <r-help at r-project.org>; Ursula Garczarek <Ursula.Garczarek at cytel.com>
Subject: Re: [R] Help documentation of "The Studentized range Distribution"

Well, it is clear enough that the problem is in interpreting the documentation. However, when you claim you tested something, and found it inconsistent with tables, it would be advisable to back it up with examples!

The description in the help files and in the sources is admittedly confusing. The original paper has this, rather more clear, description in the abstract:

"We consider the probability distribution of the maximum of r statistics each distributed as the Studentized range of means calculated from c random samples of size n from normal populations. The rc samples are assumed to be mutually independent and a common pooled—within—samplevariance is used throughout."

So the connection is nranges == r, and nmeans == c. (n never actually factors in because sqrt(n) is part of the standardization)

For the typical application, r is 1 for the usual studentized range distribution. E.g. for two large groups:

> qtukey(.95,2,df=Inf)
[1] 2.771808

As there is only one difference to consider, this should be distributed like the absolute value of the difference between two standard normals, and yes: We get our old friend 1.96 from

> qtukey(.95,2,df=Inf)/sqrt(2)
[1] 1.959964

It is less than fortunate that the help file speaks of "sample size for range". It is marginally defensible, because it is about the standardized range of a sample _of means_, but it is likely to confuse the actual reader into believing that it has to do with the sample size for each mean.

-pd

> On 10 Jul 2017, at 05:04 , Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
>
> We cannot help you understand what you are doing if you do not show us what you are doing.  Here are some discussions about how to communicate questions about R [1][2][3].
>
> [1]
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-repro
> ducible-example
>
> [2] http://adv-r.had.co.nz/Reproducibility.html
>
> [3] https://cran.r-project.org/web/packages/reprex/index.html
> --
> Sent from my phone. Please excuse my brevity.
>
> On July 6, 2017 11:36:47 AM PDT, Ursula Garczarek <Ursula.Garczarek at cytel.com> wrote:
>> Dear all,
>> I wanted to compare Bonferroni vs TukeyHSD correction over a range of
>> groups and group sizes, and wanted to use the function qtukey.
>>
>> In the help documentation it says
>>
>> qtukey(p, nmeans, df, nranges = 1, lower.tail = TRUE, log.p = FALSE)
>> Arguments q
>>
>> vector of quantiles.
>>
>> p
>>
>> vector of probabilities.
>>
>> nmeans
>>
>> sample size for range (same for each group).
>>
>> df
>>
>> degrees of freedom for s (see below).
>>
>> nranges
>>
>> number of groups whose maximum range is considered.
>>
>> log.p
>>
>> logical; if TRUE, probabilities p are given as log(p).
>>
>> lower.tail
>>
>> logical; if TRUE (default), probabilities are P[X   x], otherwise,
>> P[X
>>> x].
>>
>>
>> But when I test it, "nmeans" actually should be the number of groups,
>> and not "nrange" to fit with tables of the studentized range
>> distribution.
>>
>> Can that be - it should be a rather old procedure, so I wonder
>> whether I get something completely wrong...
>>
>> Regards,
>> Ursula
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> This email and any attachments are confidential and may
>> ...{{dropped:8}}
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

This email and any attachments are confidential and may be legally privileged. If you received this e-mail in error, please notify the sender immediately by return e-mail and delete this message and any attachments. You may contact Cytel Inc by visiting www.cytel.com/about-us/.