Dear Users,
I am new to linear mixed model and really want to know your ideas on my analysis.
Here is the experiment design, we collected longitudinal data in four time points under two conditions, with two replicates in each condition.
A scheme:subject Time (hours) Condition ValueA 2 case xx A 8 case xxA 20 case xxA 44 case xxB 2 case xx B 8 case xxB 20 case xxB 44 case xxC 2 contr xx C 8 contr xxC 20 contr xxC 44 contr xxD 2 contr xx D 8 contr xxD 20 contr xxD 44 contr xx
We collected the value with RNA-seq tech, counts on genes, log2 transformed.
My interest is:
1. Find genes significantly changed in the time course in the "case" condition , comparing to the expression in "control".2. For genes in 1., find the significant changes among time points in "case", comparing to "control".
I used the Time (used as factor) and Condition as fixed effect, subject as random effect.
I construct my hypothesis as :
test.hypo <- lmer(value ~ Time * Condition + (1| subject),data=test.da)
The null hypothesis, indicating no difference:
test.null <- lmer(value ~ 1 + (1| subject),data=test.da)
and pvalue from
anova(test.hypo, test.null)
My questions:
1. Does the pvalue answer my first interest, i.e., no difference in time course data for genes in case, comparing to control? If not, what is the proper way to design hypo and null ? 2. How to do a post-hoc analysis with results got from linear mixed model to find the difference among time points? 3. I got warnings like "In mer_finalize(ans) : false convergence (8)". How to handle this properly? How to identify in which analysis produce this warning since I have thousand genes?4. I tried to treat the Time as continuous value, rather than factor. In that case I expect a linear change for gene expression. I used the same structure as before, i.e., lmer(value ~ Time * Condition + (1| subject),data=test.da) while Time is continuous now. Should I use lmer(value ~ Time + Condition + (1| subject),data=test.da) instead, like ANCOVA? And what is the better way to treat Time, factor or continuous variable? 5. The data is RNA-seq counts, I simply normalized them by dividing "size factor" calculated by DESeq, and log transformed data showed left-hand skew distribution. Is it a big challenge for mixed model? How do you deal with RNA-seq data ?
I would like to know your suggestions on data analysis, experiment design (import for future design), and recommend tutorial and text books on this topic.
Thanks very much for your patience and time!
Best regards,
--Shao Chunuxan
[[alternative HTML version deleted]]