[BioC] Technical Replicates in EdgeR

Gordon K Smyth smyth at wehi.EDU.AU
Thu Jul 24 08:29:46 CEST 2014


> Date: Wed, 23 Jul 2014 15:41:02 -0500
> From: Neha Mehta <nsmehta at u.northwestern.edu>
> To: "Ryan C. Thompson" <rct at thompsonclan.org>
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Technical Replicates in EdgeR
>
> Thank you for your answer! Moving forward I removed the lane that I
> verified by plotMDS to be different from the other two. I have 2 further
> questions.
>
> 1) I have a few highly expressed genes - the 2 most highly expressing 
> genes make up 23 and 10 percent of all mappable reads, respectively. Do 
> I need to do something to make sure that these genes will not have a 
> negative effect on my DE assessment?

This is what TMM (or compositional normalization) is intended to ensure.

> I plan to use edgeR for DE analysis, and I know I can use TMM to 
> normalize. Will this be enough?

Probably.  There's nothing better available anyway.

> 2) When I ran a MAplot to compare my bio reps I saw that there are some 
> outliers, I have attached examples of 4 pairs of bioreps. Is this 
> something I should be concerned about?

I don't particularly see outliers from your plots, but I do see a lot of 
variation between your reps.  What are they so inconsistent?

Obviously genes that are inconsistent between reps will get large 
p-values, so the high level of variability will restrict the amount of DE 
that edgeR will detect.

The usual way to examine this is to look at your MDS plot to see if the 
samples in the four treatment groups cluster together.  That is simpler 
and more direct than doing MA plots of every pair of replicate samples.

Gordon

> Thank you again.
>
> Neha Mehta
>
>
> On Fri, Mar 21, 2014 at 5:03 PM, Ryan C. Thompson <rct at thompsonclan.org>
> wrote:
>
>> Hi there,
>>
>> No, there is absolutely no problem with dropping a lane of data if you
>> believe that lane to have technical issues. I would recommend that you use
>> the plotMDS function to verify that the counts in the bad lane are indeed
>> different from the other two lanes.
>>
>> -Ryan
>>
>>
>> On Fri 21 Mar 2014 01:45:51 PM PDT, Neha Mehta wrote:
>>
>>> Hello,
>>>
>>> I am a graduate student and fairly new to RNA-Seq. In my study I have a
>>> 2x2
>>> design with each group containing 4 bio reps and each bio rep has 3
>>> technical replicates (same library prep, but each sample is processed in 3
>>> different lanes). After studying my frequency count data I have found that
>>> the count data in lane 1 is very different from lane 2 and 3 for most to
>>> all bio samples. I plan to use EdgeR for DE analysis and I am wondering if
>>> it is OK to sum count totals from just lanes 2 and 3. All documentation I
>>> have read states that technical variation is small so you should sum all
>>> tech reps, but in this case it seems I have noticed greater variation in
>>> one lane and therefore it does not make sense to include it. I have not
>>> found any reason why you can not just sum 2 out of 3 technical replicates
>>> if you know that one technical replicate has much higher variation than
>>> the
>>> other two. Please let me know if you have any evidence for this.
>>>
>>> Thank you very much in advance.
>>>
>>>
>
>
> -- 
>
> Neha Mehta
> ---------------------------------------
> PhD Candidate, Neuroscience
> Northwestern University
> ph. (412) 874-6342
> e. nsmehta at u.northwestern.edu
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: MAbioreps2v3.pdf
> Type: application/pdf
> Size: 2630038 bytes
> Desc: not available
> URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140723/b9b62e03/attachment.pdf>
>
> ------------------------------
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list