[R-sig-ME] Simulating data with nested random effects

Thu Oct 25 18:15:19 CEST 2018

Dear Thierry,

Thanks very much for your code. I have expanded it to estimate the power of average treatment effect, for various combinations of site, subject, number of time points, and treatment effect size (copied at the end of this mail). It works nicely, predicting increasing power for increasing n_site or n_subject_site.

However, I see only small random differences in power as a function of the relative values of n_site and n_subject_site. For example, n_site = 10 and n_subject_site = 20 produces the same power (+- 0.01) as n_site = 4 and n_subject_site = 50. You can test this yourself using the code below, by setting "site.values" and "subject_site.values" to the appropriate values.

Naively I would expect there to be a more distinct dependency of power on the relative values of n_site and n_subject_site, since sigma_site and sigma_subject are not equal. For example, if I set sigma_site > sigma_subject, I would naively expect that increasing the number of sites should capture more of the between-site variance than increasing the number of subjects per site, and hence increase the power. However, I don't see any effect beyond random fluctuations, even when choosing extreme values.

Do you know why this is the case? Any insight would be greatly appreciated.

Best,

Pete

# example power analysis of treatment effect

set.seed(42)
library(lme4)
library(dplyr)

create_fake <- function(n_site, n_subject_site, n_time, effect){
  # change these hyper-parameters for testing
  intercept = 5
  trend = -0.2
  sigma_site = 0.4
  sigma_subject = 0.6
  sigma_noise = 0.1

  effect = -1.0*effect*trend
  re.site <- rnorm(n_site, mean = 0, sd = sigma_site)
  re.subject <- rnorm(n_site * n_subject_site, mean = 0, sd = sigma_subject)

  expand.grid(
    time = seq(0, (n_time-1), length = n_time),
    site = factor(seq_len(n_site)),
    subject = factor(seq_len(n_subject_site))
  ) %>%
    mutate(
      subject = factor(interaction(site, subject)),
      treatment = sample(rep(0:1, n_site*n_subject_site/2))[subject],
      fixed = intercept + (trend + effect*treatment)*time,
      random = re.site[site] + re.subject[subject],
      mu = fixed + random,
      y = rnorm(n(), mean = mu, sd = sigma_noise)
    )
}

trial.power <- function(n_site, n_subject_site, n_time, effect, n.sims=1000){
  signif <- rep (NA, n.sims)
  for (s in 1:n.sims){
    fake <- create_fake(n_site, n_subject_site, n_time, effect)
    lme.power <- lmer(
                  y ~ time + time:treatment + (1 | site/subject),
              data=fake
             )
    theta.hat <- coef(summary(lme.power))['time:treatment', 'Estimate']
    theta.se <- coef(summary(lme.power))['time:treatment', 'Std. Error']
    signif[s] <- ifelse (theta.hat - 2*theta.se > 0, 1, 0)
  }
  power <- mean(signif)
  return(power)
}

# change these values for testing
effect.values <- c(0.2,0.4,0.6,0.8,1.0)
site.values <- c(6, 8, 10, 15, 20, 40, 60, 100)
subject_site.values <- c(20)
time.values <- c(3)

for(i1 in 1:length(effect.values)){
 for(i2 in 1:length(site.values)){
  for(i3 in 1:length(subject_site.values)){
   for(i4 in 1:length(time.values)){
    power <- trial.power(site.values[i2], subject_site.values[i3], time.values[i4], effect.values[i1])
    cat("Power =", power ,", for effect =", effect.values[i1] ,", n_sites =", site.values[i2], ", n_subject_site =", subject_site.values[i3], ", n_time =", time.values[i4], "\n")
   }
  }
 }
}

________________________________
From: Thierry Onkelinx <thierry.onkelinx using inbo.be>
Sent: 21 October 2018 09:41:49
To: Wijeratne, Peter
Cc: r-sig-mixed-models
Subject: Re: [R-sig-ME] Simulating data with nested random effects

Dear Pete,

I rewrote your code to make it shorter and IMHO more readable. Meaningful variables names require less comments on what they are. The model yields sensible estimates on data simulated with the code below.

library(lme4)
library(dplyr)
create_fake <- function(
  n_site = 100, n_subject_site = 10, n_time = 10,
  intercept = 10, trend = 0.1, effect = 0.5,
  sigma_site = 5, sigma_subject = 2, sigma_noise = 1
){
  re.site <- rnorm(n_site, mean = 0, sd = sigma_site)
  re.subject <- rnorm(n_site * n_subject_site, mean = 0, sd = sigma_subject)

  expand.grid(
    time = seq(0, 2, length = n_time),
    site = seq_len(n_site),
    subject = seq_len(n_subject_site)
  ) %>%
    mutate(
      subject = interaction(site, subject),
      treatment = sample(0:1, size = n_site * n_subject_site,, replace = TRUE)[subject],
      fixed = intercept + effect * treatment + trend * time,
      random = re.site[site] + re.subject[subject],
      mu = fixed + random,
      y = rnorm(n(), mean = mu, sd = sigma_noise)
    )
}
dataset <- create_fake()
m <- lmer(y ~ treatment + time + (1|site/subject), data = dataset)
summary(m)

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be<mailto:thierry.onkelinx using inbo.be>
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be<http://www.inbo.be>

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

[https://inbo-website-prd-532750756126.s3-eu-west-1.amazonaws.com/inbologoleeuw_nl.png]<https://www.inbo.be>

Op do 18 okt. 2018 om 10:28 schreef Wijeratne, Peter <p.wijeratne using ucl.ac.uk<mailto:p.wijeratne using ucl.ac.uk>>:
Dear r-sig-mixed-models,

I would like to simulate nested data, where my mixed effects model fitted to real data has the form:

y ~ time + (1 | site/subject)

I then take the hyper-parameters from this model to simulate fake data, using this function:

create_fake <- function(J,K,L,HP,t){
# J : number of sites
# K : number of subjects / site
# L : number of years
# HP: hyperparameters from fit, y ~ time + (1 | site/subject)
# t: fractional effectiveness of treatment
time <- rep(seq(0,2,length=L), J*K)
subject <- rep(1:(J*K), each=L)
site <- sample(rep (1:J, K))
site1 <- factor(site[subject])
treatment <- sample(rep (0:1, J*K/2))
treatment1 <- treatment[subject]

# time coefficient
g.0.true <- as.numeric( HP['g.0.true'] )
# treatment coefficient
g.1.true <- -as.numeric(t)*g.0.true
# intercept
mu.a.true <- as.numeric( HP['mu.a.true'] )
# fixed effects
b.true <- (g.0.true + g.1.true*treatment)
# random effects
sigma.y.true <- as.numeric( HP['sigma.y.true'] ) # residual std dev
sigma.a.true <- as.numeric( HP['sigma.a.true'] ) # site std dev
sigma.a0.true <- as.numeric( HP['sigma.a0.true'] ) # site:person std dev
a0.true <- rnorm(J*K, 0, sigma.a0.true)
a.true <- rnorm(J*K, mu.a.true + a0.true, sigma.a.true)
y <- rnorm(J*K*L, a.true[subject] + b.true[subject]*time, sigma.y.true)

return(data.frame( y, time, subject, treatment1, site1 ))

I then fit models of the form:

y ~ time + time:treatment1 + (1 | site1/subject)

To the fake data. However, this approach can (but not always) produce a 'site' standard deviation approximately a factor of 10 less than in the real data.

My question is - is my simulation function correct?

Note - I can generate data and provide the full code if required.

Thanks in advance for any help.

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

	[[alternative HTML version deleted]]