[BioC] Analysing RNA-Seq data using DESeq package

Thu Sep 15 18:58:08 CEST 2011

FOr the 6 sequenced samples, we ran alignments to get expression estimates. The protocol is to align the reads, then count the number of reads falling within the boundaries of the annotated genes, then normalize with respect to the number of reads aligning in each sample(not the sample length). 
  The analysis also attempts to capture the non uniquely aligning reads by estimating the unique read counts for each gene, then apportioning the ambiguously aligning reads among the potential sources based on the ratios of read counts among those sources established by the less ambiguous readsie the first round of apportioning assigns 2-mapped reads based on the unique alignments, then 3-mapped reads are apportioned based on the adjusted read counts, and so on).
So, in the attached you'll see three sets of columns for the samples, with those head "unique"
giving the per-million-reads-aligned normalized values for each samples uniquely aligned reads, "apportioned" using the adjusted values as described above, and "total" giving the number of reads aligned to the gene models without regard to their uniqueness. Note that in all cases, we consider only reads mapping to no more than 5 locations. 
Hence, the values that are in non integer forms. Kindly help me through this

Suryavadhan
________________________________________
From: Steve Lianoglou [mailinglist.honeypot at gmail.com]
Sent: Thursday, September 15, 2011 9:42 AM
To: Kayilai, Suryavadhan (MU-Student)
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Analysing RNA-Seq data using DESeq package

Hi Suryavadhan,

On Tue, Sep 13, 2011 at 12:41 PM, Kayilai, Suryavadhan (MU-Student)
<skhx5 at mail.missouri.edu> wrote:
> I downloaded the DESeq package for the RNA seq analysis of the Soybean genes. The package is really helpful and easy to use. Thanks! I have a small doubt and it would be kind of you, if could help me figure out the same.
>             The package works fine for the gene data with whole number or integer values. How can I run the analysis for decimal data as the class newCountDataset does not allow me to input decimal data. It would be great if you could help me through this.

It doesn't let you put in non-integer data, because the models DESeq
uses to test for significance assumes count data -- as in, the number
of reads that align to a given region, which can only ever be
integers.

What types of data are you trying to put in that are decimal values,
anyway? What does it represent?

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact