[BioC] EdgeR: sRNA analysis check
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Aug 21 02:24:16 CEST 2013
Dear Kenlee,
There does look to be something seriously wrong, but I suspect the problem
is with the counts in the first place rather than the analysis. Hard to
diagnose by email.
You seem to be relying third party web sites rather than with the
documentation that comes with edgeR. Have you looked at the edgeR User's
Guide? It recommends against normalized counts, see Section 2.5.
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth
> Date: Tue, 20 Aug 2013 17:49:02 +1000
> From: Kenlee Nakasugi <kenlee.nakasugi at sydney.edu.au>
> To: bioconductor at r-project.org
> Subject: [BioC] EdgeR: sRNA analysis check
>
> Hi,
>
> I am hoping someone can comment on whether what I have done is correct
> as I do not have anyone here to confide in and am relatively new to
> this.
>
> Experiment:
> This is a sRNA deep sequencing experiment.
> I have two samples and two controls. The samples are a cellular sub
> compartment of the controls, and the controls are whole cell extract
> which also contain sequences from the sub compartment.
> The goal is to see whether there have been some transfer of sRNAs from
> the total extract into the sub-compartment.
> Due to the nature of the sample prep, and there will inevitably be some
> contamination of total extract sequences in the sub-compartment
> sequences. There are also some sRNA sequences that are natively present
> in the samples and are highly concentrated relative to the total extract
> (which also contain them). The focus is *not* on these sequences. The
> intention is to normalise first against the abundance of these sequences
> (create a 'base-level abundance'), and then input the new abundances
> into edgeR to identify sRNA sequences that have been enriched in the
> samples (i.e. have been transferred rather than contamination).
>
> Analysis design:
> The workflow is basically following: http://cgrlucb.wikispaces.com/edgeR
> +spring2013
>
> 1. Get the total counts of sample specific sRNA sequences in all samples
> and controls.
> 2. Normalise raw counts against the counts from step 1 for each
> sample/control
> 3. Run edgeR as described in http://cgrlucb.wikispaces.com/edgeR
> +spring2013 (The R code is at the end of this email).
>
> I am attaching some plots I generated. The BCV plot looks weird to me,
> but I'm not sure how to interpret.
>
> I am worried that I have done something wrong by first normalising
> against those sample specific sRNA sequences (getting the 'normCounts'
> variable below)
>
> Any advice greatly appreciated!!
>
> Cheers,
> Ken
>
>
>
> Here is the R code:
>
> #################
>
>> sampleInfo=read.delim("sampleInfo", header =T, sep="\t")
>> group = sampleInfo[,2]
>> group
> [1] sample sample total total
> Levels: sample total
>
>> rawcounts <- read.delim("abun.tab", row.names="Sequence")
>> head(rawcounts)
> S1 S2 C1 C2
> AAAAAAAAAATAGCCGTTGGACCC 1 2 1 3
> AAAAAAAAAGAACCTAATGGATCGAACT 15 40 8 9
> AAAAAAAAATGGCCGTTGGGAACC 1 1 2 2
> AAAAAAAACCTGAATAACCCGAAA 5 8 129 34
> AAAAAAAAGAACCTAATGGATCGAACT 44 86 15 23
> AAAAAAAATCTGAATAACCCGAAA 1 1 38 4
>
>> housekeep <- c(7966975,7644095,715816,724099) ## calculated these
> abundances outside of R
>> normCounts <- round(t(t(rawcounts)/housekeep)*mean(housekeep)) ## Not
> sure if this is right thing to do
>
>> filter <- means >=10
>> normCountsfilt <- normCounts[filter,]
>
>> cds2 <- DGEList(normCountsfilt, group=group)
>> cds2 = calcNormFactors(cds2)
>> cds2 = estimateCommonDisp(cds2, verbose=T)
> Disp = 0.52689 , BCV = 0.7259
>> cds2 = estimateTagwiseDisp(cds2)
>
> #BCVplot
>> plotBCV(cds2)
> #MeanVariance Plot
>> meanVarPlot <- plotMeanVar(cds2, show.raw.vars=TRUE,
> show.tagwise.vars=TRUE, show.binned.common.disp.vars=FALSE,
> show.ave.raw.vars=FALSE, NBline = TRUE , nbins = 100, pch = 16 , xlab
> ="Mean Expression (Log10 Scale)" , ylab = "VaPlot" )
>
>> et <- exactTest(cds2, pair=levels(group))
>> top <- topTags(et, n=nrow(cds2$counts))$table
>> de <- rownames(top[top$FDR<0.05,])
>
> #SmearPlot
>> plotSmear(cds2 , de.tags=de)
> #VolcanoPlot
>> plot(top$logFC, -log10(top$PValue), pch=20, cex=.5,
> ylab="-log10(p-value)", xlab="logFC", col=as.numeric(rownames(top) %in%
> de)+1)
>
> #############################
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: MeanVariancePlotfilterlowreads
> Type: image/png
> Size: 33391 bytes
> Desc: not available
> URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130820/4e2e0d70/attachment-0004.png>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: plotBCVfilterlowreads
> Type: image/png
> Size: 18635 bytes
> Desc: not available
> URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130820/4e2e0d70/attachment-0005.png>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: SmearPlotfilterlowreads
> Type: image/png
> Size: 56590 bytes
> Desc: not available
> URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130820/4e2e0d70/attachment-0006.png>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: VolcanoPlotfilterlowreads
> Type: image/png
> Size: 22130 bytes
> Desc: not available
> URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130820/4e2e0d70/attachment-0007.png>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list