[R-sig-ME] A model for paired data (treatment vs. control on the same subject) nested within time points

Fri Aug 6 01:31:09 CEST 2021

0
I have an experimental design where I took the left and right brain hemispheres from mice across several time points (hence its cross-sectional) following treatment at time 0 to the left brain hemisphere only. Therefore the right hemisphere serves as a paired control for the left hemisphere at each of the time points. Here's the samples data.frame:
library(dplyr)
df <- data.frame(animal_id = c("id1","id1","id2","id2","id3","id3","id4","id4","id5","id5","id6","id6","id7","id7","id8","id8","id9","id9","id10","id10","id11","id11","id12","id12","id13","id13","id14","id14","id15","id15","id16","id16","id17","id17","id18","id18","id19","id19","id20","id20","id21","id21","id22","id22","id23","id23","id24","id24","id25","id25","id26","id26","id27","id27","id28","id28","id29","id29","id30","id30"),
                  time_point = c(0,0,0,0,2,2,2,2,2,2,3,3,3,3,3,3,6,6,6,6,6,6,9,9,9,9,14,14,14,14,14,14,14,14,26,26,26,26,26,26,26,26,50,50,50,50,50,50,50,50,74,74,170,170,170,170,170,170,170,170),
                  hemisphere = c("L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R","L","R"),
                  cov1 = c(99,99,98,98,42.2,42.2,47.6,47.6,38.7,38.7,73.5,73.5,40.6,40.6,37.4,37.4,29.9,29.9,35.1,35.1,38.9,38.9,33.5,33.5,37.9,37.9,38,38,37.3,37.3,45.2,45.2,40.4,40.4,40.3,40.3,39.6,39.6,38.3,38.3,38.9,38.9,37.7,37.7,41.1,41.1,42.8,42.8,37.5,37.5,41.1,41.1,40.8,40.8,42.8,42.8,39,39,40.9,40.9),
                  cov2 = c(28,28,27.3,27.3,28.2,28.2,28.1,28.1,25.6,25.6,30,30,30.1,30.1,30.3,30.3,30.2,30.2,28.3,28.3,31.6,31.6,28.9,28.9,31.5,31.5,26.4,26.4,26.1,26.1,27.5,27.5,26.4,26.4,23.6,23.6,26.5,26.5,25,25,22.1,22.1,27.8,27.8,23.2,23.2,23.2,23.2,21.1,21.1,25.9,25.9,25.7,25.7,25.4,25.4,26.7,26.7,19,19),
                  stringsAsFactors = F) %>%
  dplyr::mutate(sample_id = paste0(animal_id,"_",hemisphere,"_",time_point))

As you can see I have 30 animals where for each I have both hemispheres at each time point, however the number of animals at each time point varies between 1 to 4.

The values that I measured are gene expression levels (which are positive integers).

For example, this simulated matrix has such counts for 10000 genes for each of the 60 samples.

What I'm interested in is how the difference in gene expression levels between the left and right hemispheres changes at each time point relative to the first time point, while controlling for the covariates (cov1 and cov2). For this reason I convert time_point to a factor, as well as animal_id and hemisphere (setting hemisphere R as baseline):
df$time_point <- factor(df$time_point)
df$animal_id <- factor(df$animal_id)
df$hemisphere <- factor(df$hemisphere, levels = c("R","L"))

What would be the right lm or glm model for my question and data?

Seems like animal_id is nested within time_point so perhaps:

gene_expression ~ cov1 + cov2 + hemisphere + (time_point/animal_id) ?

	[[alternative HTML version deleted]]