[BioC] Interpretation of fold.changes for covariates in a linear model using voom
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Apr 18 02:01:31 CEST 2014
Dear Francesco,
I think that you may be mis-judging the nature of a linear model with
multiple factors and covariates.
Limma computes log-fold-changes by fitting a linear model and extracting
the estimated coefficients. The interpretation of each coefficient is
that it represents the change in the reponse (log-expression in this case)
for each unit change in the covariate, assuming that all other covariates
in the model can be held constant.
For example, the coefficient for alpha in your hypothetical model is the
change in the log-expression of the gene that would occur if alpha was
changed (from 0 to 1) but all other factors in the model were held
constant.
This is not the same as fitting a simple model with just alpha and no
other factors. The coefficient that you would get from the simple model
with just alpha will not be the same as the coefficient you would get for
alpha from the full model. In the simple model, the other factors are not
held constant. It would easily be that the full model could give a
negative coefficient but the simple model gives a positive coefficient.
This occurs when there are other factors in the model that are positively
correlated with alpha but have the opposite effect on expression.
When you make a simple summary of log-expression values for wild-type KRAS
vs. mutated, you are looking at the marginal effect of KRAS when all other
factors are allowed to vary as well. In other words, you are examining
the simple model with just KRAS as a factor. There is no reason that this
should be the same as the logFC for KRAS when all other factors have been
adjusted for. The simple effect could easily be driven by factors other
than KRAS.
You might find it helpful to consult a textbook on multiple regression
that discusses the subtleties of unbalanced anova or correlated
covariates.
Best wishes
Gordon
> Date: Mon, 14 Apr 2014 06:22:45 -0700 (PDT)
> From: "Francesco Gatto [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, gatto at chalmers.se
> Subject: [BioC] Interpretation of fold.changes for covariates in a
> linear model using voom
>
> I have implemented a linear model to fit RNAseq read counts for ~1000
> samples using ~200 covariates plus the intercept. The pipeline follows
> the example in the user manual, i.e. v=voom(y,design) ->
> fit=lmFit(v,design) -> fit2=eBayes(fit). Then I am using decideTests()
> to assess the effect of each covariate on gene expression. However, the
> output is less complete than topTable(), so I used this function to
> obtain values on fold-changes and individual p-values. At this stage,
> values become difficult to interpret. If the LM is:
>
> gene_i,j = intercept_i + alfa_i*Factor1_j + beta_i*Factor2_j + (~200 other factors)
>
> and, to simplify, let's say covariates are binary. Then what do the
> fold-changes for topTable(fit2,coef="alfa") mean?
> My interpretation was the up/down-regulation due to the presence of
> covariate alfa, but if I boxplot or summarize the expression values for
> the two groups (alfa = 1 vs alfa = 0) I get quite contradictory results.
> For example this is the most DE gene when the sample has a mutation in
> KRAS (the covariate, 0 if wild-type, 1 if mutated):
>
>> topTable(fit2,coef="metadataKRAS",number=1)
>
> hgnc_symbol logFC AveExpr t P.Value adj.P.Val B
> GOLGA6A -1.314213 -5.9000494 -7.375175 3.800569e-13 7.204358e-09 18.80961
>
> The summary of gene expression values for wild-type KRAS vs. mutated is:
>
>> by(E["GOLGA6A",],design[,"metadataKRAS"],summary)
>
> design[, "metadataKRAS"]: 0
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> -7.882 -6.865 -6.461 -6.021 -5.770 1.548
> ----------------------------------------------------------------
> design[, "metadataKRAS"]: 1
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> -7.657 -6.247 -5.487 -5.239 -4.582 -1.311
>
> This is an up-regulation, and of magnitude way lower than the one
> indicated by topTable. Is this because I am misusing the function
> topTable? I cannot wrap my head around this. Thanks for your help!
>
> -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] sv_SE.UTF-8/sv_SE.UTF-8/sv_SE.UTF-8/C/sv_SE.UTF-8/sv_SE.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_3.4.2 limma_3.18.9
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list