[BioC] DESeq with no replicates

Wolfgang Huber wk.huber at gmail.com
Sun Dec 19 12:38:52 CET 2010


Dear Xiwei

can you try with a more recent version of DESeq, e.g. version 1.2.1 from 
Bioconductor 2.7 ? Just to keep the discussion more relevant to all of us.

1.+4.: The size factors you report do indeed look peculiar. I don't 
think that this has anything to do with the DESeq version though - one 
possible explanation could be that combined_data matrix contains lots of 
'trivial' rows (e.g. all 0 or all 1). The default size factor estimation 
is not based on the sums, but on the median of the ratios between the 
columns. In most cases this makes sense, but you can of course override 
it and supply your own size factors if you have a good reason.

What do you get from
    plot(jitter(combined_data))
or
    alpha=0.5; plot(jitter(combined_data^alpha))  ?

One thing I would try is to inspect the matrix combined_data more 
closely, and if there are many trivial rows, then to filter out those 
rows first.

3. According to the manual page of nbinomTest, "resVarA is the ratio of 
the row-wise estimate of the base variance of the counts for condition 
A, divided by ..."
Since you have no replicates in condition A (and similarly for B), that 
number is NA.

2. I don't know what is going on... Note that 'padj' is close to one. 
Still, in order to move forward, can you please retry with the above 
suggestions, and if doubts remain, send us the combined_data matrix to 
inspect the problem more directly?

	Best wishes
	Wolfgang


Il Dec/16/10 8:55 PM, Wu, Xiwei ha scritto:
> Dear list,
>
> I have a smallRNA sequencing data from two samples, without replicates.
> I tried using DESeq to identify differentially expressed gene following
> the manual, and the code is as below:
>
>> library(DESeq)
>> conds<- c("NSC", "TSC")
>> head(combined_data)
>                   NSC    TSC
> hsa-let-7a    906684 413336
> hsa-let-7a*        2   1281
> hsa-let-7a-2*     51    132
> hsa-let-7b    179751  63079
> hsa-let-7b*       46    684
> hsa-let-7c    380745  50729
>
>> apply(combined_data, 2, sum)
>      NSC     TSC
> 5018095 8658429
>
>> cds<- newCountDataSet(combined_data, conds)
>> cds<- estimateSizeFactors( cds )
>> sizeFactors(cds)
> NSC TSC
>    1   1
>
>> cds<- estimateVarianceFunctions(cds, pool=T)
>> res<- nbinomTest(cds, "NSC", "TSC")
>
>> head(res[order(res$pval),])
>
>                  id baseMean baseMeanA baseMeanB foldChange
> log2FoldChange   pval      padj resVarA resVarB
> 397 hsa-miR-299-5p    329.5       329       330  1.0030395
> 0.004378441 0.002408825 0.9236618      NA      NA
> 131   hsa-miR-1285    407.0       406       408  1.0049261
> 0.007089425 0.002922980 0.9236618      NA      NA
> 196   hsa-miR-146a    170.5       170       171  1.0058824
> 0.008461579 0.004641615 0.9696484      NA      NA
> 525   hsa-miR-425*    180.5       178       183  1.0280899
> 0.039966407 0.008776619 0.9696484      NA      NA
> 826   hsa-miR-675*    714.0       722       706  0.9778393
> -0.032330654 0.008892177 0.9696484      NA      NA
> 505   hsa-miR-379*    260.5       256       265  1.0351562
> 0.049848549 0.012192131 0.9696484      NA      NA
>
>> sessionInfo()
> R version 2.11.1 (2010-05-31)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
> other attached packages:
> [1] preprocessCore_1.10.0 limma_3.4.5           DESeq_1.0.6
> locfit_1.5-6          lattice_0.18-8
> [6] akima_0.5-4           Biobase_2.8.0
>
> loaded via a namespace (and not attached):
>   [1] annotate_1.26.1      AnnotationDbi_1.10.2 DBI_0.2-5
> genefilter_1.30.0    geneplotter_1.26.0
>   [6] grid_2.11.1          RColorBrewer_1.0-2   RSQLite_0.9-2
> splines_2.11.1       survival_2.35-8
> [11] tools_2.11.1         xtable_1.5-6
>
>
>
> I have a few questions here:
>
> 1. The sizeFactors were estimated as 1, but the actual total number of
> reads in the two samples differ 60%, could it be correct?
>
> 2. It seems that the identified top candidates with the smallest p value
> are all with very small fold changes. I have seen huge difference
> between the two samples for some miRNA, but the P value was close to 1;
>
> 3. The resVarA and resVarB are NA for all genes, did I do something
> wrong when calling the estimateVarianceFunctions?
>
> 4. For a RNA or small RNA sequencing experiment with only two samples,
> what is the best way to normalize data?
>
> Any help will be greatly appreciated.
>
> Xiwei Wu, Ph.D.
> Bioinformatics Core
> Assistant Research Professor
> Department of Molecular Medicine
> Beckman Research Institute
> City of Hope
> x.65071
>
>
> ---------------------------------------------------------------------
> SECURITY/CONFIDENTIALITY WARNING:  \ This message and ...{{dropped:30}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list