[R-sig-ME] Testing a hypothesis that there was a change after a specific time point

Mon Feb 19 17:41:01 CET 2024

Hello Members, I am writing to request your advice on how best to do hypothesis testing for our study.

Our data looks as follows:

> head(x)
# A tibble: 6 � 7
  user_id  user_male  days log_days bool_program   dv1     dv2
  <chr>        <int> <int>    <dbl>        <dbl> <dbl>   <dbl>
1 IDX_195          1  1581     7.37            1 0.150 0.00590
2 IDX_949          1  1338     7.20            1 0.130 0.0348
3 IDX_2428         1   577     6.36            0 0.160 0.0438
4 IDX_2312         1   424     6.05            0 0.179 0.0364
5 IDX_277          1   790     6.67            0 0.419 0.0515
6 IDX_1029         1  1489     7.31            1 0.155 0.0219
>

Besides the gender of the user, we have data of users on dv1 and dv2 over 6 years, with the days variable ranging from 0 to 2190 (and log_days being its log transformation).

We would like to test the hypothesis that a program announcement made on the day 850 caused a significant (potentially, gradual) change in dv1 and/or dv2 scores (regardless of the direction of change) for male and/or female users. The bool_program is set to 0 for days < 1118 and 1 otherwise.

We are wondering what is the best way to conduct this test, given the hierarchical/nested nature of data.

We have thus far taken the approach of using lmer:

> m = lmer(
+   dv1 ~ user_male + bool_program * log_days + (1|user_id),
+   data = x
+ )

> summary(m)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: dv1 ~ user_male + bool_program * log_days + (1 | user_id)
   Data: x

REML criterion at convergence: -103411.7

Scaled residuals:
    Min      1Q  Median      3Q     Max
-4.0430 -0.3797 -0.0443  0.2992 19.5088

Random effects:
 Groups   Name        Variance  Std.Dev.
 user_id  (Intercept) 0.0004456 0.02111
 Residual             0.0034587 0.05881
Number of obs: 37137, groups:  user_id, 1012

Fixed effects:
                        Estimate Std. Error         df t value Pr(>|t|)
(Intercept)            1.878e-01  5.318e-03  9.979e+03  35.310  < 2e-16 ***
user_male              2.018e-02  2.674e-03  7.157e+02   7.546 1.36e-13 ***
bool_program           1.036e-01  1.588e-02  3.713e+04   6.524 6.93e-11 ***
log_days              -6.641e-03  7.185e-04  3.713e+04  -9.243  < 2e-16 ***
bool_program:log_days -1.522e-02  2.177e-03  3.713e+04  -6.991 2.78e-12 ***
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Correlation of Fixed Effects:
            (Intr) usr_ml bl_prg lg_dys
user_male   -0.471
bool_progrm -0.238 -0.017
log_days    -0.872  0.012  0.289
bl_prgrm:l_  0.268  0.018 -0.998 -0.329

> interactions::sim_slopes(m, pred = log_days, modx = bool_program, digits = 3)
JOHNSON-NEYMAN INTERVAL

When bool_program is OUTSIDE the interval [-0.672, -0.294], the slope of log_days is p < .05.

Note: The range of observed values of bool_program is [0.000, 1.000]

SIMPLE SLOPES ANALYSIS
Slope of log_days when bool_program = 0.000 (0):
    Est.    S.E.   t val.       p
-------- ------- -------- -------
  -0.007   0.001   -9.243   0.000

Slope of log_days when bool_program = 1.000 (1):
    Est.    S.E.    t val.       p
-------- ------- --------- -------
  -0.022   0.002   -10.631   0.000

We are not sure if the approach is right and whether we are specifying the days variable appropriately in lmer. We are also not sure if we should be using a more sophisticated change point approach. We came across some Rpackages such as changepoint, segmented, and strucchange. Are they more appropriate than lmer approach we have used?

Request your advice.

Thanks and kind regards
Srinivas

	[[alternative HTML version deleted]]