[BioC] DESeq with no replicates
Wolfgang Huber
wk.huber at gmail.com
Sun Dec 19 12:38:52 CET 2010
Dear Xiwei
can you try with a more recent version of DESeq, e.g. version 1.2.1 from
Bioconductor 2.7 ? Just to keep the discussion more relevant to all of us.
1.+4.: The size factors you report do indeed look peculiar. I don't
think that this has anything to do with the DESeq version though - one
possible explanation could be that combined_data matrix contains lots of
'trivial' rows (e.g. all 0 or all 1). The default size factor estimation
is not based on the sums, but on the median of the ratios between the
columns. In most cases this makes sense, but you can of course override
it and supply your own size factors if you have a good reason.
What do you get from
plot(jitter(combined_data))
or
alpha=0.5; plot(jitter(combined_data^alpha)) ?
One thing I would try is to inspect the matrix combined_data more
closely, and if there are many trivial rows, then to filter out those
rows first.
3. According to the manual page of nbinomTest, "resVarA is the ratio of
the row-wise estimate of the base variance of the counts for condition
A, divided by ..."
Since you have no replicates in condition A (and similarly for B), that
number is NA.
2. I don't know what is going on... Note that 'padj' is close to one.
Still, in order to move forward, can you please retry with the above
suggestions, and if doubts remain, send us the combined_data matrix to
inspect the problem more directly?
Best wishes
Wolfgang
Il Dec/16/10 8:55 PM, Wu, Xiwei ha scritto:
> Dear list,
>
> I have a smallRNA sequencing data from two samples, without replicates.
> I tried using DESeq to identify differentially expressed gene following
> the manual, and the code is as below:
>
>> library(DESeq)
>> conds<- c("NSC", "TSC")
>> head(combined_data)
> NSC TSC
> hsa-let-7a 906684 413336
> hsa-let-7a* 2 1281
> hsa-let-7a-2* 51 132
> hsa-let-7b 179751 63079
> hsa-let-7b* 46 684
> hsa-let-7c 380745 50729
>
>> apply(combined_data, 2, sum)
> NSC TSC
> 5018095 8658429
>
>> cds<- newCountDataSet(combined_data, conds)
>> cds<- estimateSizeFactors( cds )
>> sizeFactors(cds)
> NSC TSC
> 1 1
>
>> cds<- estimateVarianceFunctions(cds, pool=T)
>> res<- nbinomTest(cds, "NSC", "TSC")
>
>> head(res[order(res$pval),])
>
> id baseMean baseMeanA baseMeanB foldChange
> log2FoldChange pval padj resVarA resVarB
> 397 hsa-miR-299-5p 329.5 329 330 1.0030395
> 0.004378441 0.002408825 0.9236618 NA NA
> 131 hsa-miR-1285 407.0 406 408 1.0049261
> 0.007089425 0.002922980 0.9236618 NA NA
> 196 hsa-miR-146a 170.5 170 171 1.0058824
> 0.008461579 0.004641615 0.9696484 NA NA
> 525 hsa-miR-425* 180.5 178 183 1.0280899
> 0.039966407 0.008776619 0.9696484 NA NA
> 826 hsa-miR-675* 714.0 722 706 0.9778393
> -0.032330654 0.008892177 0.9696484 NA NA
> 505 hsa-miR-379* 260.5 256 265 1.0351562
> 0.049848549 0.012192131 0.9696484 NA NA
>
>> sessionInfo()
> R version 2.11.1 (2010-05-31)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
>
> other attached packages:
> [1] preprocessCore_1.10.0 limma_3.4.5 DESeq_1.0.6
> locfit_1.5-6 lattice_0.18-8
> [6] akima_0.5-4 Biobase_2.8.0
>
> loaded via a namespace (and not attached):
> [1] annotate_1.26.1 AnnotationDbi_1.10.2 DBI_0.2-5
> genefilter_1.30.0 geneplotter_1.26.0
> [6] grid_2.11.1 RColorBrewer_1.0-2 RSQLite_0.9-2
> splines_2.11.1 survival_2.35-8
> [11] tools_2.11.1 xtable_1.5-6
>
>
>
> I have a few questions here:
>
> 1. The sizeFactors were estimated as 1, but the actual total number of
> reads in the two samples differ 60%, could it be correct?
>
> 2. It seems that the identified top candidates with the smallest p value
> are all with very small fold changes. I have seen huge difference
> between the two samples for some miRNA, but the P value was close to 1;
>
> 3. The resVarA and resVarB are NA for all genes, did I do something
> wrong when calling the estimateVarianceFunctions?
>
> 4. For a RNA or small RNA sequencing experiment with only two samples,
> what is the best way to normalize data?
>
> Any help will be greatly appreciated.
>
> Xiwei Wu, Ph.D.
> Bioinformatics Core
> Assistant Research Professor
> Department of Molecular Medicine
> Beckman Research Institute
> City of Hope
> x.65071
>
>
> ---------------------------------------------------------------------
> SECURITY/CONFIDENTIALITY WARNING: \ This message and ...{{dropped:30}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
More information about the Bioconductor
mailing list