[BioC] edgeR: handling missing values with Quantile normalisation
Paul Leo
p.leo at uq.edu.au
Wed Aug 31 03:07:47 CEST 2011
HI Sonika
It is probably not zero's that are causing the problem but NAs,
Check through the counts array
to see if it contains NA's ... someting like..
apply(d$counts,2,function(x) sum(is.na(x)))
should get back all zeros....
probably setting them to 0 is appropriate.
Cheers
Paul
-----Original Message-----
From: Sonika Tyagi <Sonika.Tyagi at agrf.org.au>
To: 'bioconductor at r-project.org' <bioconductor at r-project.org>
Subject: [BioC] edgeR: handling missing values with Quantile
normalisation
Date: Wed, 31 Aug 2011 10:02:26 +1000
Hi there,
I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data.
The code syntax I am using is here:
> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE)
> targets
files group description
1 Sample_xx_count.txt.raw control something
2 Sample_xx_count.txt.raw control something
3 Sample_xx_count.txt.raw Hi_Pos something
4 Sample_xx_count.txt.raw Hi_Pos something
5 Sample_xx_count.txt.raw control something
6 Sample_xx_count.txt.raw control something
7 ................
d <- readDGE(targets, skip = 0, comment.char = "#")
d
An object of class "DGEList"
$samples
files group description lib.size norm.factors
1 Sample_xx_count.txt.raw control something 498180513 1
2 Sample_xx_count.txt.raw control something 483775405 1
3 Sample_xx_count.txt.raw Hi_Pos something 368609647 1
4 Sample_xx_count.txt.raw Hi_Pos something 617334315 1
5 Sample_xx_count.txt.raw control something 678060765 1
13 more rows ...
$counts
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Tag1 15923 20323 14867 23098 32484 17223 51579 29578 17408 24097 34470 31964 17583 17583 39460 0 30359 25416
Tag2 700 600 200 695 500 1300 1425 1775 700 1974 1300 2371 900 900 1689 0 898 1690
Tag3 0 0 100 0 0 0 0 0 0 0 0 0 0 0 100 0 100 0
Tag4 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290 0 98369 79673
Tag5 19868 19385 25500 31215 56684 24096 51265 37492 27420 24496 32729 24722 24913 24913 50448 0 39755 55829
21887 more rows ...
d <- calcNormFactors(d)
Error in quantile.default(x, p = q) :
missing values and NaN's not allowed if 'na.rm' is FALSE
Could someone please suggest how to handle the missing values with edgeR normalisation methods ?
Thank you
Sonika
-------------------
> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_2.0.5 svIDE_0.9-50
loaded via a namespace (and not attached):
[1] limma_3.6.9 svMisc_0.9-61 tcltk_2.12.2 tools_2.12.2 XML_3.2-0.2
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list