[R] prop.trend.test

Tue Sep 12 14:21:24 CEST 2023

Argh, yes, drats, thanks. 

There will be a matter of an estimated residual error.

So

> coef(summary(ht$lmfit))["score","t value"]*sigma(ht$lmfit)
[1] -2.867913

matches the signes square root of the Chi-square.

Or, likely better (avoid 0 df cases), switch to a Gaussian glm fit and use the z stat

> coef(summary(ht$glmfit, dispersion = 1))
               Estimate Std. Error   z value      Pr(>|z|)
(Intercept)  1.02187141 0.03199737 31.936102 8.425506e-224
score       -0.03341563 0.01165155 -2.867913  4.131897e-03

The Estimate (-0.0334) should still make sense as the LPM estimate of the regression slope.

- Peter D.

> On 8 Sep 2023, at 12:53 , Rui Barradas <ruipbarradas using sapo.pt> wrote:
> 
> Às 10:06 de 08/09/2023, peter dalgaard escreveu:
>> Yes, this was written a bit bone-headed (as I am allowed to say...)
>> If you look at the code, you will see inside:
>>     a <- anova(lm(freq ~ score, data = list(freq = x/n, score = as.vector(score)),
>>         weights = w))
>> and the lm() inside should give you the direction via the sign of the regression coefficient on "score".
>>  So, at least for now, you could just doctor a copy of the code for your own purposes, as in
>>  fit <- lm(freq ~ score, data = list(freq = x/n, score = as.vector(score)),
>>         weights = w)
>>  a <- anova(fit)
>>  and arrange to return coef(fit)["score"] at the end. Something like structure(... estimate=c(lpm.slope=coef(fit)["score"]) ....)
>> (I expect that you might also extract the t-statistic from coef(summary(fit)) and find that it is the signed square root of the Chi-square, but I won't have time to test that just now.)
>> -pd
>>> On 8 Sep 2023, at 07:22 , Thomas Subia via R-help <r-help using r-project.org> wrote:
>>> 
>>> Colleagues,
>>> 
>>> Thanks all for the responses.
>>> 
>>> I am monitoring the daily total number of defects per sample unit.
>>> I need to know whether this daily defect proportion is trending upward (a bad thing for a manufacturing process).
>>> 
>>> My first thought was to use either a u or a u' control chart for this.
>>> As far as I know, u or u' charts are poor to detect drifts.
>>> 
>>> This is why I chose to use prop.trend.test to detect trends in proportions.
>>> 
>>> While prop.trend.test can confirm the existence of a trend, as far as I know, it is left to the user
>>> to determine what direction that trend is.
>>> 
>>> One way to illustrate trending is of course to plot the data and use geom_smooth and method lm
>>> For the non-statisticians in my group, I've found that using this method along with the p-value of prop.trend.test, makes it easier for the users to determine the existence of trending and its direction.
>>> 
>>> If there are any other ways to do this, please let me know.
>>> 
>>> Thomas Subia
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Thursday, September 7, 2023 at 10:31:27 AM PDT, Rui Barradas <ruipbarradas using sapo.pt> wrote:
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Às 14:23 de 07/09/2023, Thomas Subia via R-help escreveu:
>>>> 
>>>> Colleagues
>>>> 
>>>>    Consider
>>>> smokers  <- c( 83, 90, 129, 70 )
>>>> patients <- c( 86, 93, 136, 82 )
>>>> 
>>>>    prop.trend.test(smokers, patients)
>>>> 
>>>>    Output:
>>>> 
>>>>        Chi-squared Test for Trend inProportions
>>>> 
>>>>    data:  smokers out of patients ,
>>>> 
>>>> using scores: 1 2 3 4
>>>> 
>>>> X-squared = 8.2249, df = 1, p-value = 0.004132
>>>> 
>>>>    # trend test for proportions indicates proportions aretrending.
>>>> 
>>>>    How does one identify the direction of trending?
>>>>    # prop.test indicates that the proportions are unequal but doeslittle to indicate trend direction.
>>>> All the best,
>>>> Thomas Subia
>>>> 
>>>> 
>>>>     [[alternative HTML version deleted]]
>>>> 
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> Hello,
>>> 
>>> By visual inspection it seems that there is a decreasing trend.
>>> Note that the sample estimates of prop.test and smokers/patients are equal.
>>> 
>>> 
>>> smokers  <- c( 83, 90, 129, 70 )
>>> patients <- c( 86, 93, 136, 82 )
>>> 
>>> prop.test(smokers, patients)$estimate
>>> #>    prop 1    prop 2    prop 3    prop 4
>>> #> 0.9651163 0.9677419 0.9485294 0.8536585
>>> 
>>> smokers/patients
>>> 
>>> #> [1] 0.9651163 0.9677419 0.9485294 0.8536585
>>> 
>>> plot(smokers/patients, type = "b")
>>> 
>>> 
>>> 
>>> Hope this helps,
>>> 
>>> Rui Barradas
>>> 
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
> 
> Actually, the t-statistic is not the signed square root of the X-squared test statistic. I have edited the function, assigned the lm fit and returned it as is. (print.htest won't print this new list member so the output is not cluttered with irrelevant noise.)
> 
> 
> smokers  <- c( 83, 90, 129, 70 )
> patients <- c( 86, 93, 136, 82 )
> 
> edit(prop.trend.test, file = "ptt.R")
> source("ptt.R")
> 
> # stats::prop.trend.test edited to include the results
> # of the lm fit and saved under a new name
> ptt <- function (x, n, score = seq_along(x))
> {
>  method <- "Chi-squared Test for Trend in Proportions"
>  dname <- paste(deparse1(substitute(x)), "out of", deparse1(substitute(n)),
>                 ",\n using scores:", paste(score, collapse = " "))
>  x <- as.vector(x)
>  n <- as.vector(n)
>  p <- sum(x)/sum(n)
>  w <- n/p/(1 - p)
>  a <- anova(fit <- lm(freq ~ score, data = list(freq = x/n, score = as.vector(score)),
>                       weights = w))
>  chisq <- c(`X-squared` = a["score", "Sum Sq"])
>  structure(list(statistic = chisq, parameter = c(df = 1),
>                 p.value = pchisq(as.numeric(chisq), 1, lower.tail = FALSE),
>                 method = method, data.name = dname
>                 , lmfit = fit
>                 ), class = "htest")
> }
> 
> 
> ht <- ptt(smokers, patients)
> ht$statistic |> sqrt()
> #> X-squared
> #>  2.867913
> ht$lmfit |> summary() |> coef()
> #>                Estimate Std. Error   t value   Pr(>|t|)
> #> (Intercept)  1.02187141 0.04732740 21.591539 0.00213815
> #> score       -0.03341563 0.01723384 -1.938954 0.19207036
> 
> 
> Hope this helps,
> 
> Rui Barradas

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com