[BioC] Limma Voom R package

Tue Jan 8 17:26:05 CET 2013

Hi Pedro,

On 1/7/2013 3:47 PM, Pedro Blecua wrote:
> Dear Sir/Madam,
>
> I am a postdoctoral researcher at Chris Mason's lab at the ICB Cornell
> Medical College in NYC.
> I would be very interested in using your R package for RNA-seq
> analysis of some raw data we have.
> We went through the manual quickly, and it is not clear for me how to
> start the analysis, i.e., input file.
>
> To be more specific: given the fastq file (or binary fastq.tbz), could
> we use it as input for Voom, and then
> use the result for Limma? Or should we align first our raw fastq data
> and then use the sam or bam files as
> input for the Vomm or Limma packages? How should I proceed to start an
> analysis from raw fastq files?

You need to align using a gapped aligner (bowtie2, gsnap, etc), and then 
use the resulting bam file to get counts per transcript, which is the 
input to voom.

Once you have the aligned data, you can use GenomicFeatures and the 
correct transcript.db package to get the counts using 
summarizeOverlaps(). Given aligned bam files, I usually do something like

library(Rsamtools)
library(GenomicFeatures)
bflst <- BamFileList(<character vector of bam files, including path if 
not in working dir>)
library(Tx.Db.Hsapiens.UCSC.hg19.knownGene) ## substitute applicable 
species here
feat <- exonsBy(Tx.Db.Hsapiens.UCSC.hg19.knownGene, by = "gene")
olaps <- summarizeOverlaps(feat, bflst)

then you can do

counts <- assays(olaps)$counts
voom(counts)

Best,

Jim

>
> I would highly appreciate an answer at your earliest convenience.
>
> Thank you very much in advance for your attention,
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099