Some helper functions for statistical analysis

Introduction

Many widely used and powerful statistical analysis commands — such as lm, glm, lme4::lmer, etc — have a simple and consistent calling syntax, often involving a “formula” (e.g., y ~ x), which makes them consistent, and easy to remember and apply. Some other functions, even simple ones, don’t use the formula syntax, or can be a bit awkward to use in some contexts, or require default values of arguments to be explicitly overridden. In the psyntur, there are some tools that aim to make this functions easier to apply.

These functions and the accompanying data sets can be loaded with the usual library command.

library(psyntur)

Independent samples t-test with t_test

R’s stats::t.test makes it easy to perform independent, paired, or one-sample t-tests. For the independent sample t-test, the default is the Welch two sample t-test. While arguably a good choice in practice, when t-tests are being taught to illustrate a simple example of normal linear model, the assumption of homogeneity of variance is used. To use this with t.test, this requires var.equal = TRUE to be used. The t_test function is psyntur is used when the standard independent t-test with homogeneity of variance is the desired default test. For example, in the following, we use it with the faithfulfaces data set.

t_test(trustworthy ~ face_sex, data = faithfulfaces)
#> 
#>  Two Sample t-test
#> 
#> data:  trustworthy by face_sex
#> t = 1.9389, df = 168, p-value = 0.05419
#> alternative hypothesis: true difference in means between group female and group male is not equal to 0
#> 95 percent confidence interval:
#>  -0.004253649  0.471193782
#> sample estimates:
#> mean in group female   mean in group male 
#>             4.444061             4.210591

Paired samples t-test with paired_t_test

For paired t-tests, the paired_t_test function can be used. In this function, a formula is not used. Instead, two variables in the same data frame, which are assumed to be paired in some manner, are used. For example, the pairedsleep data set (included in psyntur) is as follows.

pairedsleep
#> # A tibble: 10 × 3
#>    ID       y1    y2
#>    <fct> <dbl> <dbl>
#>  1 1       0.7   1.9
#>  2 2      -1.6   0.8
#>  3 3      -0.2   1.1
#>  4 4      -1.2   0.1
#>  5 5      -0.1  -0.1
#>  6 6       3.4   4.4
#>  7 7       3.7   5.5
#>  8 8       0.8   1.6
#>  9 9       0     4.6
#> 10 10      2     3.4

This gives the difference from control in number of hours slept by 10 different patients when each took two different drugs. These time differences under the two drugs are y1 and y2. A paired samples t-test can be performed as follows with this data.

paired_t_test(y1, y2, data = pairedsleep)
#> 
#>  Paired t-test
#> 
#> data:  vec_1 and vec_2
#> t = -4.0621, df = 9, p-value = 0.002833
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  -2.4598858 -0.7001142
#> sample estimates:
#> mean of the differences 
#>                   -1.58

Pairwise t-tests with pairwise_t_test

For independent t-tests applied all pairs of a set of variables, to which p-value adjustments are applied, we can use pairwise_t_test. For example, the following creates a categorical variable with four values, which are the interaction of two binary variables.

data_df <- dplyr::mutate(vizverb, IV = interaction(task, response))

Independent samples t-tests with Bonferroni corrections on the time variable applied to all pairs of the four levels of the IV variable can be done as follows.

pairwise_t_test(time ~ IV, data = data_df)
#> 
#>  Pairwise comparisons using t tests with pooled SD 
#> 
#> data:  y and x 
#> 
#>               verbal.verbal visual.verbal verbal.visual
#> visual.verbal 0.0790        -             -            
#> verbal.visual 1.0000        0.0166        -            
#> visual.visual 0.0044        2.9e-07       0.0241       
#> 
#> P value adjustment method: bonferroni

Shipiro-Wilk test with shapiro_test

The Shapiro-Wilk test of normality can be applied to a single numeric vector in a data frame as in the following example.

shapiro_test(time, data = data_df)
#> # A tibble: 1 × 2
#>   statistic   p_value
#>       <dbl>     <dbl>
#> 1     0.911 0.0000378

To test the normality of each subset of a variable, such as time, corresponding to the values of a categorical variable, we can use a by variable as in the following example.

shapiro_test(time, by = IV, data = data_df)
#> # A tibble: 4 × 3
#>   IV            statistic  p_value
#>   <fct>             <dbl>    <dbl>
#> 1 verbal.verbal     0.755 0.000198
#> 2 visual.verbal     0.861 0.00809 
#> 3 verbal.visual     0.938 0.221   
#> 4 visual.visual     0.914 0.0763