[BioC] Bioconductor Digest, Vol 102, Issue 29
Davis McCarthy
davismcc.lists at gmail.com
Thu Sep 1 01:11:20 CEST 2011
Sonika and Alok
Just to confirm: zeros will not cause the the problem that you have
reported (tested on dozens of datasets with zero counts). Like Paul, I
suspect that you have some NAs in your count matrix. This is unusual.
I haven't seen RNA-Seq results with NAs before.
I suggest you follow Paul's suggestion. If you find NAs then you can
make a decision about removing the tag or setting NAs to zero. If you
don't find NAs then we can dig deeper.
As an aside I also note that you are using an older version of R and
edgeR. I strongly recommend updating to R 2.13 and the corresponding
version of edgeR using biocLite(), which will give you edgeR 2.2.5. We
have done a lot of development and improvement of the package in the
last year.
Best wishes
Davis
> To: "'bioconductor at r-project.org'" <bioconductor at r-project.org>
> Date: Wed, 31 Aug 2011 10:02:26 +1000
> Subject: [BioC] edgeR: handling missing values with Quantile normalisation
> Hi there,
>
> I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data.
>
> The code syntax I am using is here:
>
>> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE)
>> targets
> files group description
> 1 Sample_xx_count.txt.raw control something
> 2 Sample_xx_count.txt.raw control something
> 3 Sample_xx_count.txt.raw Hi_Pos something
> 4 Sample_xx_count.txt.raw Hi_Pos something
> 5 Sample_xx_count.txt.raw control something
> 6 Sample_xx_count.txt.raw control something
> 7 ................
>
> d <- readDGE(targets, skip = 0, comment.char = "#")
> d
>
> An object of class "DGEList"
> $samples
> files group description lib.size norm.factors
> 1 Sample_xx_count.txt.raw control something 498180513 1
> 2 Sample_xx_count.txt.raw control something 483775405 1
> 3 Sample_xx_count.txt.raw Hi_Pos something 368609647 1
> 4 Sample_xx_count.txt.raw Hi_Pos something 617334315 1
> 5 Sample_xx_count.txt.raw control something 678060765 1
> 13 more rows ...
>
> $counts
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
> Tag1 15923 20323 14867 23098 32484 17223 51579 29578 17408 24097 34470 31964 17583 17583 39460 0 30359 25416
> Tag2 700 600 200 695 500 1300 1425 1775 700 1974 1300 2371 900 900 1689 0 898 1690
> Tag3 0 0 100 0 0 0 0 0 0 0 0 0 0 0 100 0 100 0
> Tag4 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290 0 98369 79673
> Tag5 19868 19385 25500 31215 56684 24096 51265 37492 27420 24496 32729 24722 24913 24913 50448 0 39755 55829
> 21887 more rows ...
>
>
> d <- calcNormFactors(d)
> Error in quantile.default(x, p = q) :
> missing values and NaN's not allowed if 'na.rm' is FALSE
>
> Could someone please suggest how to handle the missing values with edgeR normalisation methods ?
>
> Thank you
> Sonika
> -------------------
>
>> sessionInfo()
> R version 2.12.2 (2011-02-25)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252
> [4] LC_NUMERIC=C LC_TIME=English_Australia.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_2.0.5 svIDE_0.9-50
>
> loaded via a namespace (and not attached):
> [1] limma_3.6.9 svMisc_0.9-61 tcltk_2.12.2 tools_2.12.2 XML_3.2-0.2
>
> [[alternative HTML version deleted]]
>
>
>
>
> ---------- Forwarded message ----------
> From: Paul Leo <p.leo at uq.edu.au>
> To: Sonika Tyagi <Sonika.Tyagi at agrf.org.au>
> Date: Wed, 31 Aug 2011 11:07:47 +1000
> Subject: Re: [BioC] edgeR: handling missing values with Quantile normalisation
>
> HI Sonika
> It is probably not zero's that are causing the problem but NAs,
>
> Check through the counts array
> to see if it contains NA's ... someting like..
>
> apply(d$counts,2,function(x) sum(is.na(x)))
>
> should get back all zeros....
>
> probably setting them to 0 is appropriate.
>
>
> Cheers
> Paul
>
>
>
> -----Original Message-----
> From: Sonika Tyagi <Sonika.Tyagi at agrf.org.au>
> To: 'bioconductor at r-project.org' <bioconductor at r-project.org>
> Subject: [BioC] edgeR: handling missing values with Quantile
> normalisation
> Date: Wed, 31 Aug 2011 10:02:26 +1000
>
> Hi there,
>
> I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data.
>
> The code syntax I am using is here:
>
>> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE)
>> targets
> files group description
> 1 Sample_xx_count.txt.raw control something
> 2 Sample_xx_count.txt.raw control something
> 3 Sample_xx_count.txt.raw Hi_Pos something
> 4 Sample_xx_count.txt.raw Hi_Pos something
> 5 Sample_xx_count.txt.raw control something
> 6 Sample_xx_count.txt.raw control something
> 7 ................
>
> d <- readDGE(targets, skip = 0, comment.char = "#")
> d
>
> An object of class "DGEList"
> $samples
> files group description lib.size norm.factors
> 1 Sample_xx_count.txt.raw control something 498180513 1
> 2 Sample_xx_count.txt.raw control something 483775405 1
> 3 Sample_xx_count.txt.raw Hi_Pos something 368609647 1
> 4 Sample_xx_count.txt.raw Hi_Pos something 617334315 1
> 5 Sample_xx_count.txt.raw control something 678060765 1
> 13 more rows ...
>
> $counts
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
> Tag1 15923 20323 14867 23098 32484 17223 51579 29578 17408 24097 34470 31964 17583 17583 39460 0 30359 25416
> Tag2 700 600 200 695 500 1300 1425 1775 700 1974 1300 2371 900 900 1689 0 898 1690
> Tag3 0 0 100 0 0 0 0 0 0 0 0 0 0 0 100 0 100 0
> Tag4 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290 0 98369 79673
> Tag5 19868 19385 25500 31215 56684 24096 51265 37492 27420 24496 32729 24722 24913 24913 50448 0 39755 55829
> 21887 more rows ...
>
>
> d <- calcNormFactors(d)
> Error in quantile.default(x, p = q) :
> missing values and NaN's not allowed if 'na.rm' is FALSE
>
> Could someone please suggest how to handle the missing values with edgeR normalisation methods ?
>
> Thank you
> Sonika
> -------------------
>
>> sessionInfo()
> R version 2.12.2 (2011-02-25)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252
> [4] LC_NUMERIC=C LC_TIME=English_Australia.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_2.0.5 svIDE_0.9-50
>
> loaded via a namespace (and not attached):
> [1] limma_3.6.9 svMisc_0.9-61 tcltk_2.12.2 tools_2.12.2 XML_3.2-0.2
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
> ---------- Forwarded message ----------
> From: ALok <foralok at gmail.com>
> To: Paul Leo <p.leo at uq.edu.au>
> Date: Wed, 31 Aug 2011 10:47:22 +0530
> Subject: Re: [BioC] edgeR: handling missing values with Quantile normalisation
> Hi Sonika
>
> You can calculate quantile.default independently with the argument
> quantile((x, p = q, na.rm = TRUE)
> and pass this value to the main function
> this will automatically take care of zeros.
>
> or alternatively you can try other methods ("TMM", "RLE", "quantile") for
> calcNormFactors, if that fits in your requirements.
>
> cheers
> Alok
>
>
> On Wed, Aug 31, 2011 at 6:37 AM, Paul Leo <p.leo at uq.edu.au> wrote:
>
>>
>> HI Sonika
>> It is probably not zero's that are causing the problem but NAs,
>>
>> Check through the counts array
>> to see if it contains NA's ... someting like..
>>
>> apply(d$counts,2,function(x) sum(is.na(x)))
>>
>> should get back all zeros....
>>
>> probably setting them to 0 is appropriate.
>>
>>
>> Cheers
>> Paul
>>
>>
>>
>> -----Original Message-----
>> From: Sonika Tyagi <Sonika.Tyagi at agrf.org.au>
>> To: 'bioconductor at r-project.org' <bioconductor at r-project.org>
>> Subject: [BioC] edgeR: handling missing values with Quantile
>> normalisation
>> Date: Wed, 31 Aug 2011 10:02:26 +1000
>>
>> Hi there,
>>
>> I am analysing RNAseq counts using edgeR package. But I am running into
>> problems because of 'zero' counts for certain tags in my data.
>>
>> The code syntax I am using is here:
>>
>> > targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE)
>> > targets
>> files group description
>> 1 Sample_xx_count.txt.raw control something
>> 2 Sample_xx_count.txt.raw control something
>> 3 Sample_xx_count.txt.raw Hi_Pos something
>> 4 Sample_xx_count.txt.raw Hi_Pos something
>> 5 Sample_xx_count.txt.raw control something
>> 6 Sample_xx_count.txt.raw control something
>> 7 ................
>>
>> d <- readDGE(targets, skip = 0, comment.char = "#")
>> d
>>
>> An object of class "DGEList"
>> $samples
>> files group description lib.size
>> norm.factors
>> 1 Sample_xx_count.txt.raw control something 498180513 1
>> 2 Sample_xx_count.txt.raw control something 483775405 1
>> 3 Sample_xx_count.txt.raw Hi_Pos something 368609647 1
>> 4 Sample_xx_count.txt.raw Hi_Pos something 617334315 1
>> 5 Sample_xx_count.txt.raw control something 678060765 1
>> 13 more rows ...
>>
>> $counts
>> 1 2 3 4 5 6 7 8 9
>> 10 11 12 13 14 15 16 17 18
>> Tag1 15923 20323 14867 23098 32484 17223 51579 29578 17408 24097 34470
>> 31964 17583 17583 39460 0 30359 25416
>> Tag2 700 600 200 695 500 1300 1425 1775 700 1974
>> 1300 2371 900 900 1689 0 898 1690
>> Tag3 0 0 100 0 0 0 0 0 0 0 0
>> 0 0 0 100 0 100 0
>> Tag4 74008 58753 51648 65233 93828 71047 117340 90551 55000 70124
>> 121393 86106 46197 46197 127290 0 98369 79673
>> Tag5 19868 19385 25500 31215 56684 24096 51265 37492 27420 24496
>> 32729 24722 24913 24913 50448 0 39755 55829
>> 21887 more rows ...
>>
>>
>> d <- calcNormFactors(d)
>> Error in quantile.default(x, p = q) :
>> missing values and NaN's not allowed if 'na.rm' is FALSE
>>
>> Could someone please suggest how to handle the missing values with edgeR
>> normalisation methods ?
>>
>> Thank you
>> Sonika
>> -------------------
>>
>> > sessionInfo()
>> R version 2.12.2 (2011-02-25)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
>> LC_MONETARY=English_Australia.1252
>> [4] LC_NUMERIC=C LC_TIME=English_Australia.1252
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] edgeR_2.0.5 svIDE_0.9-50
>>
>> loaded via a namespace (and not attached):
>> [1] limma_3.6.9 svMisc_0.9-61 tcltk_2.12.2 tools_2.12.2 XML_3.2-0.2
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> ************************************************************
> Alok Kumar Srivastava
> Ph.D scholar
> Centre of Computational Biology and Bioinformatics
> School of Computational and Integrative Sciences
> JNU, New Delhi
> ************************************************************
>
> [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
--
-----------------------------------------------------------
Davis J McCarthy
DPhil Candidate
University of Oxford
E: davis.mccarthy at balliol.ox.ac.uk
W: sites.google.com/site/davismcc
More information about the Bioconductor
mailing list