[BioC] edgeR, logFC calculation in factor combination

Gordon K Smyth smyth at wehi.EDU.AU
Sun May 18 05:55:00 CEST 2014


Dear Mike,

edgeR does compute logFC correctly, and a positive logFC does obviously 
mean that the counts are higher, other things being equal, in the second 
group than the first group.

If you are making the comparison B-A (B minus A), then a positive logFC 
means expression is higher in B than in A.  The topTable() function always 
output the contrast you are making at the top of the table, so that the 
table is unambiguous.

I can't tell from your email what comparison you actually made.  What is 
the non-ascii character is your email between "control_1.localization_A" 
and "control_1.localization_B"?  Was it supposed to be a minus sign or 
something else?

I don't know whether you are making a simple comparison between two 
groups, or whether you have fitted a more complex linear model.  This can 
affect the interpration of logFCs.

Best wishes
Gordon


> Date: Fri, 16 May 2014 12:06:14 +0200
> From: Mike Miller <mike.bioc32 at gmail.com>
> To: <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] edgeR, logFC calculation in factor combination
> Message-ID:
> 	<CANkSkzosYxfQKn3y=iTKQ4Qj7DPP+Da+umgPJkU0DE98RUU5bg at mail.gmail.com>
> Content-Type: text/plain
>
> Dear All,
>
>
> I am using edgeR for an RNASeq experiment (~30 samples), where I need to
> explore the influence of 2 factors with 2 levels each (there are actually
> more factors, but for simplicity I put only 2 here):
>
> 1. disease_state (levels: control_0 and control_1)
>
> 2. localization (levels: A and B)
>
>
> I combined the factors and got 4 combinations:
>
> control_0.localization_A, control_1.localization_A,
> control_0.localization_B, control_1.localization_B.
>
> I am specifically interested in a list of diff. expressed genes between
> these 2 combinations: control_1.localization_A – control_1.localization_B.
>
> When separating up- and down-regulated genes, I did it simply according to
> logFC column (after filtering for genes with padj < 0.05). If the logFC<0,
> could I infer that the gene is DOWN-regulated in control_1.localization_A?
> I assume that this is true.
>
> However, when I take the raw counts for one of the diff. expressed genes
> (for which logFC<0), in control_1.localization_A and in
> control_1.localization_B conditions, it turned out to be the following:
>
> mean(control_1.localization_A) / mean(control_1.localization_B) = 4.2
>
> (Formula explained in words: mean of raw counts for the diff. expressed
> gene X in samples which are control_1 and localization_A, divided by mean
> of raw counts for the diff. expressed gene X in samples which are control_1
> and localization_B)
>
> This is contrary to the conclusion I got from logFC, since according to
> raw counts that gene is UP-regulated in control_1.localization_A.
>
> I know that logFC is calculated differently, but shouldn't the ratio
> (whether the gene is up- or down-regulated) stay nevertheless the same (if
> library sizes are very similar in all samples, which is the case here)?
>
> Maybe a more precise question would be: how is logFC calculated in this
> case, when we have a combination of different factors?
>
> Thank you very much in advance for any piece of clarification!
>
> Mike

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}


More information about the Bioconductor mailing list