[R-sig-ME] Generalized Linear Mixed-Effects Model for single RNA sequencing data

Mon Dec 7 21:08:29 CET 2020

I am currently working on a single cell RNA sequencing project with some
time pressure that requires the use of multiple generalized linear effects
models.

My experiment consists of unique patients receiving or not receiving a drug
(let's call it drug B) and has all patients' blood sequenced at 3 days and
some patients' blood sequenced at 7 days.

I am hoping to compare the cell composition of these patients (%neutrophil,
TCell, BCell, Monocyte, Platelet, etc.) between treatment with drug B and
no treatment with drug B at 3 days and 7 days.

I am doing each of the day 3 and 7 comparisons separately (with more than 5
patients in the patient column).

Thus, my data frame looks similar to the one below for day 3 (specifically
Tcells):

freqs:
drug_status proportion_Tcells
patient ncells
Received 0.5 A10 2765
no 0.3 A2 1456
Received 0.6 A11 3102
no 0.4 A3 2013
Received 0.3 A13 4105
Where ncells was the total number of cells recovered from that patient.

The first glm that I wrote was:

glmer(proportion_Tcells ~ drug_status + (1 | patient),
                   weights = ncells,
                   family = binomial,
                   data = freqs)

Although this model seems to work very well for comparing cell proportions
at day 3 and day 7 across treatment groups, I was wondering if I required
another level of nesting like (1 |drug_status:patient).

The main issue I'm running into however is when I try to longitudinally
compare cell proportions in paired samples when nearly all of my cell type
proportions come out as significant.

I am doing the day3 to day 7 comparisons separately for treatment with drug
B and no treatment with drug B(but now with only 3 or 4 patients in the
patient column as not all patients had longitudinal comparators).

Thus, my data frame looks like the one below for treatment with drug B
(specifically Tcells):

freqs:
time proportion_Tcells
patient ncells
day3 0.3 patient1 1456
day7 0.4 patient1 1644
day3 0.4 patient2 2341
day7 0.3 patient2 4312
day3 0.5 patient3 3012
day7 0.7 patient3
1829
I think that this would be a fully crossed design.

I was wondering if there was an issue with running my model like this
again? I am not sure if my sample size is large enough and that is what is
driving my very low p values, or if my model is not appropriately designed.

glmer(proportion_Tcells ~ time + (1 | patient),
                   weights = ncells,
                   family = binomial,
                   data = freqs)

Best,

Rohit

	[[alternative HTML version deleted]]