Hi.
I have found some R/Bioconductor/Genominator code on the web (below) and it
measures differential expression of RNA-seq short read data using a general
linear model.

Can someone explain some basic questions I have?

1) What is the reason for using 2 glm's for measuring differential
expression?

2) In the function(y) there are two linear models ran; one with argument y ~
groups and the other with argument y ~ 1. Why do this?

3) What does the offset do?

4) Why use ANOVA; is to compare the linear models?

5) What can we say about results, if adjusted for multiple testing; how
would you explain a significant result?

6) Would an adjusted p-value of <= 0.05 be significant?

Basically, I don't know much about the statistics done below and any advice
or pointers to good literature for this would be a great help. Thank you.

> head(geneCountsUI)
        mut_1_f mut_2_f wt_1_f wt_2_f
YAL069W       0       0      0      0
YBL049W      19      18     10      4

# Normalisation of RNA-seq lanes
notZero <- which(rowSums(geneCountsUI) != 0)
upper.quartiles <- apply(geneCountsUI[notZero, ], 2, function(x) quantile(x,
0.75))
uq.scaled <- upper.quartiles/sum(upper.quartiles) * sum(laneCounts)

# Calculating differential expression
groups <- factor(rep(c("mut", "wt"), times = c(2, 2)))

pvalues <- apply(geneCountsUI[notZero, ], 1,
  function(y) {
    fit <- glm(y ~ groups, family = poisson(), offset = log(uq.scaled))
    fit0 <- glm(y ~ 1, family = poisson(), offset = log(uq.scaled))
    anova(fit0, fit, test = "Chisq")[2, 5]
})

	[[alternative HTML version deleted]]

