[BioC] Bioconductor Digest, Vol 102, Issue 29

Davis McCarthy davismcc.lists at gmail.com
Thu Sep 1 01:11:20 CEST 2011


Sonika and Alok

Just to confirm: zeros will not cause the the problem that you have
reported (tested on dozens of datasets with zero counts). Like Paul, I
suspect that you have some NAs in your count matrix. This is unusual.
I haven't seen RNA-Seq results with NAs before.

I suggest you follow Paul's suggestion. If you find NAs then you can
make a decision about removing the tag or setting NAs to zero. If you
don't find NAs then we can dig deeper.

As an aside I also note that you are using an older version of R and
edgeR. I strongly recommend updating to R 2.13 and the corresponding
version of edgeR using biocLite(), which will give you edgeR 2.2.5. We
have done a lot of development and improvement of the package in the
last year.

Best wishes
Davis




> To: "'bioconductor at r-project.org'" <bioconductor at r-project.org>
> Date: Wed, 31 Aug 2011 10:02:26 +1000
> Subject: [BioC] edgeR: handling missing values with Quantile normalisation
> Hi there,
>
> I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data.
>
> The code syntax I am using is here:
>
>> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE)
>> targets
>                                  files   group description
> 1  Sample_xx_count.txt.raw control   something
> 2  Sample_xx_count.txt.raw control   something
> 3  Sample_xx_count.txt.raw  Hi_Pos   something
> 4  Sample_xx_count.txt.raw  Hi_Pos   something
> 5  Sample_xx_count.txt.raw control   something
> 6  Sample_xx_count.txt.raw control   something
> 7   ................
>
> d <- readDGE(targets, skip = 0, comment.char = "#")
> d
>
> An object of class "DGEList"
> $samples
>                                 files   group description  lib.size norm.factors
> 1 Sample_xx_count.txt.raw control   something 498180513            1
> 2 Sample_xx_count.txt.raw control   something 483775405            1
> 3 Sample_xx_count.txt.raw  Hi_Pos   something 368609647            1
> 4 Sample_xx_count.txt.raw  Hi_Pos   something 617334315            1
> 5 Sample_xx_count.txt.raw control   something 678060765            1
> 13 more rows ...
>
> $counts
>                       1     2     3     4     5     6      7     8     9    10     11    12    13    14     15 16    17    18
> Tag1   15923 20323 14867 23098 32484 17223  51579 29578 17408 24097  34470 31964 17583 17583  39460  0 30359 25416
> Tag2        700   600   200   695   500  1300   1425  1775   700  1974   1300  2371   900   900   1689  0   898  1690
> Tag3      0     0   100     0     0     0      0     0     0     0      0     0     0     0    100  0   100     0
> Tag4     74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290  0 98369 79673
> Tag5     19868 19385 25500 31215 56684 24096  51265 37492 27420 24496  32729 24722 24913 24913  50448  0 39755 55829
> 21887 more rows ...
>
>
>  d <- calcNormFactors(d)
> Error in quantile.default(x, p = q) :
>  missing values and NaN's not allowed if 'na.rm' is FALSE
>
> Could someone please suggest how to handle the missing values with edgeR normalisation methods ?
>
> Thank you
> Sonika
> -------------------
>
>> sessionInfo()
> R version 2.12.2 (2011-02-25)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
> [4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] edgeR_2.0.5  svIDE_0.9-50
>
> loaded via a namespace (and not attached):
> [1] limma_3.6.9   svMisc_0.9-61 tcltk_2.12.2  tools_2.12.2  XML_3.2-0.2
>
>        [[alternative HTML version deleted]]
>
>
>

>
> ---------- Forwarded message ----------
> From: Paul Leo <p.leo at uq.edu.au>
> To: Sonika Tyagi <Sonika.Tyagi at agrf.org.au>
> Date: Wed, 31 Aug 2011 11:07:47 +1000
> Subject: Re: [BioC] edgeR: handling missing values with Quantile normalisation
>
> HI Sonika
> It is probably not zero's that are causing the problem but NAs,
>
> Check through the counts array
> to see if it contains  NA's ... someting like..
>
> apply(d$counts,2,function(x) sum(is.na(x)))
>
> should get back all zeros....
>
> probably setting them to 0 is appropriate.
>
>
> Cheers
> Paul
>
>
>
> -----Original Message-----
> From: Sonika Tyagi <Sonika.Tyagi at agrf.org.au>
> To: 'bioconductor at r-project.org' <bioconductor at r-project.org>
> Subject: [BioC] edgeR: handling missing values with Quantile
> normalisation
> Date: Wed, 31 Aug 2011 10:02:26 +1000
>
> Hi there,
>
> I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data.
>
> The code syntax I am using is here:
>
>> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE)
>> targets
>                                  files   group description
> 1  Sample_xx_count.txt.raw control   something
> 2  Sample_xx_count.txt.raw control   something
> 3  Sample_xx_count.txt.raw  Hi_Pos   something
> 4  Sample_xx_count.txt.raw  Hi_Pos   something
> 5  Sample_xx_count.txt.raw control   something
> 6  Sample_xx_count.txt.raw control   something
> 7   ................
>
> d <- readDGE(targets, skip = 0, comment.char = "#")
> d
>
> An object of class "DGEList"
> $samples
>                                 files   group description  lib.size norm.factors
> 1 Sample_xx_count.txt.raw control   something 498180513            1
> 2 Sample_xx_count.txt.raw control   something 483775405            1
> 3 Sample_xx_count.txt.raw  Hi_Pos   something 368609647            1
> 4 Sample_xx_count.txt.raw  Hi_Pos   something 617334315            1
> 5 Sample_xx_count.txt.raw control   something 678060765            1
> 13 more rows ...
>
> $counts
>                       1     2     3     4     5     6      7     8     9    10     11    12    13    14     15 16    17    18
> Tag1   15923 20323 14867 23098 32484 17223  51579 29578 17408 24097  34470 31964 17583 17583  39460  0 30359 25416
> Tag2        700   600   200   695   500  1300   1425  1775   700  1974   1300  2371   900   900   1689  0   898  1690
> Tag3      0     0   100     0     0     0      0     0     0     0      0     0     0     0    100  0   100     0
> Tag4     74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290  0 98369 79673
> Tag5     19868 19385 25500 31215 56684 24096  51265 37492 27420 24496  32729 24722 24913 24913  50448  0 39755 55829
> 21887 more rows ...
>
>
>  d <- calcNormFactors(d)
> Error in quantile.default(x, p = q) :
>  missing values and NaN's not allowed if 'na.rm' is FALSE
>
> Could someone please suggest how to handle the missing values with edgeR normalisation methods ?
>
> Thank you
> Sonika
> -------------------
>
>> sessionInfo()
> R version 2.12.2 (2011-02-25)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
> [4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] edgeR_2.0.5  svIDE_0.9-50
>
> loaded via a namespace (and not attached):
> [1] limma_3.6.9   svMisc_0.9-61 tcltk_2.12.2  tools_2.12.2  XML_3.2-0.2
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
> ---------- Forwarded message ----------
> From: ALok <foralok at gmail.com>
> To: Paul Leo <p.leo at uq.edu.au>
> Date: Wed, 31 Aug 2011 10:47:22 +0530
> Subject: Re: [BioC] edgeR: handling missing values with Quantile normalisation
> Hi Sonika
>
> You can calculate quantile.default independently with the argument
> quantile((x, p = q,  na.rm = TRUE)
> and pass this value to the main function
> this will automatically take care of zeros.
>
> or alternatively you can try other methods ("TMM", "RLE", "quantile") for
> calcNormFactors, if that fits in your requirements.
>
> cheers
> Alok
>
>
> On Wed, Aug 31, 2011 at 6:37 AM, Paul Leo <p.leo at uq.edu.au> wrote:
>
>>
>> HI Sonika
>> It is probably not zero's that are causing the problem but NAs,
>>
>> Check through the counts array
>> to see if it contains  NA's ... someting like..
>>
>> apply(d$counts,2,function(x) sum(is.na(x)))
>>
>> should get back all zeros....
>>
>> probably setting them to 0 is appropriate.
>>
>>
>> Cheers
>> Paul
>>
>>
>>
>> -----Original Message-----
>> From: Sonika Tyagi <Sonika.Tyagi at agrf.org.au>
>> To: 'bioconductor at r-project.org' <bioconductor at r-project.org>
>> Subject: [BioC] edgeR: handling missing values with Quantile
>> normalisation
>> Date: Wed, 31 Aug 2011 10:02:26 +1000
>>
>> Hi there,
>>
>> I am analysing RNAseq counts using edgeR package. But I am running into
>> problems because of 'zero' counts for certain tags in my data.
>>
>> The code syntax I am using is here:
>>
>> > targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE)
>> > targets
>>                                  files   group description
>> 1  Sample_xx_count.txt.raw control   something
>> 2  Sample_xx_count.txt.raw control   something
>> 3  Sample_xx_count.txt.raw  Hi_Pos   something
>> 4  Sample_xx_count.txt.raw  Hi_Pos   something
>> 5  Sample_xx_count.txt.raw control   something
>> 6  Sample_xx_count.txt.raw control   something
>> 7   ................
>>
>> d <- readDGE(targets, skip = 0, comment.char = "#")
>> d
>>
>> An object of class "DGEList"
>> $samples
>>                                 files   group description  lib.size
>> norm.factors
>> 1 Sample_xx_count.txt.raw control   something 498180513            1
>> 2 Sample_xx_count.txt.raw control   something 483775405            1
>> 3 Sample_xx_count.txt.raw  Hi_Pos   something 368609647            1
>> 4 Sample_xx_count.txt.raw  Hi_Pos   something 617334315            1
>> 5 Sample_xx_count.txt.raw control   something 678060765            1
>> 13 more rows ...
>>
>> $counts
>>                       1     2     3     4     5     6      7     8     9
>>  10     11    12    13    14     15 16    17    18
>> Tag1   15923 20323 14867 23098 32484 17223  51579 29578 17408 24097  34470
>> 31964 17583 17583  39460  0 30359 25416
>> Tag2        700   600   200   695   500  1300   1425  1775   700  1974
>> 1300  2371   900   900   1689  0   898  1690
>> Tag3      0     0   100     0     0     0      0     0     0     0      0
>>   0     0     0    100  0   100     0
>> Tag4     74008 58753 51648 65233 93828 71047 117340 90551 55000 70124
>> 121393 86106 46197 46197 127290  0 98369 79673
>> Tag5     19868 19385 25500 31215 56684 24096  51265 37492 27420 24496
>>  32729 24722 24913 24913  50448  0 39755 55829
>> 21887 more rows ...
>>
>>
>>  d <- calcNormFactors(d)
>> Error in quantile.default(x, p = q) :
>>  missing values and NaN's not allowed if 'na.rm' is FALSE
>>
>> Could someone please suggest how to handle the missing values with edgeR
>> normalisation methods ?
>>
>> Thank you
>> Sonika
>> -------------------
>>
>> > sessionInfo()
>> R version 2.12.2 (2011-02-25)
>> Platform: i386-pc-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
>>  LC_MONETARY=English_Australia.1252
>> [4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] edgeR_2.0.5  svIDE_0.9-50
>>
>> loaded via a namespace (and not attached):
>> [1] limma_3.6.9   svMisc_0.9-61 tcltk_2.12.2  tools_2.12.2  XML_3.2-0.2
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> ************************************************************
> Alok Kumar Srivastava
> Ph.D scholar
> Centre of Computational Biology and Bioinformatics
> School of Computational and Integrative Sciences
> JNU, New Delhi
> ************************************************************
>
>        [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>



-- 
-----------------------------------------------------------
Davis J McCarthy
DPhil Candidate
University of Oxford
E: davis.mccarthy at balliol.ox.ac.uk
W: sites.google.com/site/davismcc



More information about the Bioconductor mailing list