[BioC] edgeR GLM to adjust for batch effect

Ryan C. Thompson rct at thompsonclan.org
Fri Mar 28 01:05:35 CET 2014


If you only had two conditions (instead of 3) and only a single batch 
had samples from both conditions, then you would be completely unable to 
dissociate batch effects from treatment effects, and your treatment fold 
change would be entirely determined by the one batch with both 
conditions in it. (The other batches would still contribute to 
dispersion estimation.) However, in your case, you have a third 
treatment, which means that treatment and batch are not completely 
confounded, except in the case of batch 4 which has only a single 
treatment. By my understanding, in this model, batch 3 will be solely 
responsible for determining the estimate of fold change between nc and 
the mean of pos & neg, while batches 1 & 2 batches will also contribute 
to the fold change between pos and neg. Batch 4 will not contribute 
directly to any estimate of fold change between treatments. Overall, I 
would be quite uncomfortable including a batch effect in my model for 
this data, and I would search for evidence that the batch effect is 
non-significant.

It might be appropriate to estimate dispersions with a batch effect 
included and then drop the batch effect for the model fitting step, but 
I'm not confident about the statistical validity of such an approach. 
This would inflate your significance measures relative to leaving out 
the batch effect entirely, so it may end up being anti-conservative.

-Ryan


On 03/27/2014 02:51 PM, Ryan Basom wrote:
> Thanks for this advice.  I have a follow up question though:  As 
> described in the edgeR User's Guide pertaining to adjusting for batch 
> effects "In this type of analysis, the treatments are compared only 
> within each batch. The analysis is corrected for baseline differences 
> between the batches."  If some of the batches don't have samples for 
> say both treatments, how is this compensated for?  Though this isn't 
> ideal, I'd like to get a better sense of what's going on in this 
> scenario.
>
> Thanks,
> Ryan
>
>
> On 03/26/2014 04:36 PM, Ryan C. Thompson wrote:
>> You don't necessarily need every condition in every batch for the 
>> comparison to be effective, but having only one batch in common is 
>> not good. If I understand correctly, batch 3 would be the dominant 
>> contributor to the estimates of fold changes in the comparisons that 
>> you care about, since any other change would be mostly absorbed into 
>> the batch effects. I think the first step you should take is to fit 
>> the full model with conditions and batch effect and find out whether 
>> the batch effects appear to be significant enough to warrant 
>> inclusion in the model, and if not, then drop them.
>>
>> -Ryan
>>
>> On Wed 26 Mar 2014 03:47:42 PM PDT, Ryan Basom [guest] wrote:
>>>
>>>
>>> Hi,
>>>
>>> I'd like to use a GLM in edgeR to adjust for a batch effect, though 
>>> only one of my four batches has samples from both groups in the 
>>> comparisons that I'd like to conduct (pos-nc & neg-nc):
>>>
>>> 1 2 3 4
>>> pos 3 5 9 0
>>> neg 5 4 7 0
>>> nc 0 0 5 8
>>>
>>> I suspect that using a GLM in edgeR to adjust for batch will only 
>>> work properly if there's representation of both groups from a given 
>>> comparison in every batch, though would like to know if this is 
>>> otherwise. I see a batch effect using PVCA on just the pos and neg 
>>> samples, and would like to try to adjust for it somehow. Please advise.
>>>
>>> Thanks,
>>> Ryan
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- output of sessionInfo():
>>>
>>> R version 3.0.3 (2014-03-06)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 
>>> LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 
>>> LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 
>>> LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] splines parallel stats graphics grDevices utils datasets methods 
>>> base
>>>
>>> other attached packages:
>>> [1] pvca_1.2.0 beadChipCoreTools_0.49 beadAnno_1.0 lumi_2.14.1
>>> [5] Biobase_2.22.0 BiocGenerics_0.8.0 genefilter_1.44.0 
>>> arrayQualityMetrics_3.18.0
>>> [9] edgeR_3.4.2 limma_3.18.12
>>>
>>> -- 
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list