[BioC] Experimental design for RNA-Seq
    Tan, Yifang 
    Yifang.Tan at nrc-cnrc.gc.ca
       
    Fri May 28 16:08:07 CEST 2010
    
    
  
Hello list:
I post my questions here again to get help about my experiment design, as I am new and have been struggling with my analysis. I could not find the example design from LIMMA user's guide.
The first part of my experiment consists of a loop design to compare the gene expression of at different development stages (10DAP, 22DAP and 35DAP, day-after-pollenization) of the same Brassica line. The purpose is to see the differentiation at different development stages of the same single line. Pooled sample of the same line was used for each stage and treated as biological replicates.  I have dye swap plus two technical replicates. The target file consists of three or four columns as file name, Cy3 and Cy5. here is the target file:
               FileName        Cy3     Cy5
 AT Oligo 02.11.02.176.gpr.fixed 10DAP 22DAP
 AT Oligo 02.11.02.177.gpr.fixed 22DAP 10DAP
 AT Oligo 02.11.02.178.gpr.fixed 22DAP 10DAP
 AT Oligo 02.11.02.179.gpr.fixed 10DAP 22DAP
 AT Oligo 02.11.02.180.gpr.fixed 22DAP 35DAP
 AT Oligo 02.11.02.181.gpr.fixed 22DAP 35DAP
 AT Oligo 02.11.02.182.gpr.fixed 35DAP 22DAP
 AT Oligo 02.11.02.183.gpr.fixed 35DAP 22DAP
 AT Oligo 02.11.02.184.gpr.fixed 10DAP 35DAP
 AT Oligo 02.11.02.185.gpr.fixed 10DAP 35DAP
 AT Oligo 02.11.02.186.gpr.fixed 35DAP 10DAP
 AT Oligo 02.11.02.187.gpr.fixed 35DAP 10DAP
 This experiment is very similar to the design in LIMMA User's Guide section 7.4, except I have technical replicates. From the Guide, should I have to use one sample like "10DAP" as reference, or any sample for a reference?  My goal is to see which genes are differentiated from 10DAP, 22DAP and 35 DAP. How do I get the results of: 1)which genes are consistently up/down-regulated across the 3 stages? 2) which genes are up-down-regulated at each development stage?
 The second part of my experiment is:
               FileName        DPA     Cy3     Cy5
 2009-07-10-atq3.7.3.145-15.gpr        15DPA   WT      MUTANT
 2009-07-15-atq3.7.3.146-15.gpr        15DPA   MUTANT  WT
 2009-07-15-atq3.7.3.147-15.gpr        15DPA   MUTANT  WT
 2009-07-15-atq3.7.3.148-15.gpr        15DPA   WT      MUTANT
 2009-07-15-atq3.7.3.149-15.gpr        15DPA   WT      MUTANT
 2009-07-17-atq3.7.3.151-20.gpr        20DPA   MUTANT  WT
 2009-07-17-atq3.7.3.152-20.gpr        20DPA   WT      MUTANT
 2009-07-17-atq3.7.3.153-25.gpr        25DPA   MUTANT  WT
 2009-07-17-atq3.7.3.154-25.gpr        25DPA   WT      MUTANT
 2009-07-17-atq3.7.3.155-10.gpr        10DPA   MUTANT  WT
 2009-07-17-atq3.7.3.156-10.gpr        10DPA   WT      MUTANT
 2009-07-17-atq3.7.3.157-30.gpr        30DPA   MUTANT  WT
 2009-07-17-atq3.7.3.158-30.gpr        30DPA   WT      MUTANT
 2009-07-21-atq3.7.3-159-10.gpr        10DPA   MUTANT  WT
 2009-07-21-atq3.7.3-160-20.gpr        20DPA   MUTANT  WT
 2009-07-21-atq3.7.3-164-25.gpr        25DPA   MUTANT  WT
 2009-07-21-atq3.7.3-256-30.gpr        30DPA   MUTANT  WT
 2009-07-22-atq3.7.3-115-10.gpr        10DPA   MUTANT  WT
 2009-07-22-atq3.7.3-116-20.gpr        20DPA   MUTANT  WT
 2009-07-22-atq3.7.3-117-25.gpr        25DPA   MUTANT  WT
 2009-07-22-atq3.7.3-118-30.gpr        30DPA   MUTANT  WT
 2009-07-22-atq3.7.3-119-10.gpr        10DPA   WT      MUTANT
 2009-07-22-atq3.7.3-120-20.gpr        20DPA   WT      MUTANT
 2009-07-22-atq3.7.3-124-25.gpr        25DPA   WT      MUTANT
 2009-07-22-atq3.7.3-125-30.gpr        30DPA   WT      MUTANT
 The target file consists of four columns as file name, time, Cy3 and Cy5.  This is a time course experiment. Again I want to see the expression differentiation across the stage (10, 15 20, 25 and 30DAP). 
1)Can I split the analysis into five sub-groups by time course (say, 10, 15, 20 25 and 30DPA separately) instead of a whole? 
2)If I split the 25 slides into 5 sub-experiments, my feeling is the variance and normalization would be different from each other. Is this correct?  
3)How do I prepare the biolrep as I treated the pooled sample as biological replicates.?
I would appreciate very much if you could give me some suggestions on these questions. Thanks a lot!
 
 Yifang
________________________________________
From: bioconductor-bounces at stat.math.ethz.ch [bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Naomi Altman [naomi at stat.psu.edu]
Sent: Friday, May 28, 2010 9:17 AM
To: michael watson (IAH-C); bioconductor
Subject: Re: [BioC] Experimental design for RNA-Seq
At least from the stat theory point of view, the best design is equal
numbers of biological samples (the more the better) for each
condition and no technical reps.
So far, there is little indication that there are flowcell
effects.  However, to be on the safe side, you should use the
blocking principle - as much as possible distribute the reps from the
different conditions across different flow cells (unless the whole
experiment fits on a single flow cell).
--Naomi
At 04:02 AM 5/28/2010, michael watson (IAH-C) wrote:
>Dear List
>
>I'm about to design a simple experiment (knockout vs wild-type) and
>we plan to use RNA-Seq.  We're interested in gene expression, for
>mRNA and microRNAs in particular, and calculating stats for
>differential expression.
>
>I'm aware of DEseq, DEGseq and edgeR.  I wanted to ask those who
>have a lot of experience of this type of analysis if they have any
>advice for experimental design, in particular, the number of
>replicates they have used and why (I was planning on going for all
>biological replicates, no technical).
>
>Thanks
>Mick
>
>
>
>         [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
    
    
More information about the Bioconductor
mailing list