[Bioc-sig-seq] extract non-zero rows

Dario Strbenac D.Strbenac at garvan.org.au
Sun Aug 28 09:00:22 CEST 2011


Ah, yes, that method is better. I forgot to use it my example.

- Dario.

---- Original message ----
>Date: Sat, 27 Aug 2011 13:53:16 +1000
>From: davismcc at googlemail.com (on behalf of Davis McCarthy <davis.mccarthy at balliol.ox.ac.uk>)
>Subject: Re: [Bioc-sig-seq] extract non-zero rows  
>To: D.Strbenac at garvan.org.au
>Cc: Estefania Mancini <estefania.mancini at indear.com>, bioc-sig-sequencing at r-project.org
>
>Estefania and Dario
>
>A more efficient way to do this:
>> row.positive.counts <- apply(dup.data$counts, 1, function(a.row) sum(a.row > 0))
>
>would be this:
>row.positive.counts <- rowSums( dup.data$counts > 0 )
>
>You might prefer to use the functions rowSums(), rowMeans(),
>colSums(), colMeans() instead of apply(), where you can. They are much
>faster.
>
>Best wishes
>Davis
>
>
>
>On 26 August 2011 10:00, Dario Strbenac <D.Strbenac at garvan.org.au> wrote:
>> Hi Estefania,
>>
>> If you want both columns to be non-zero, you should do
>>
>
>> filtered <- dup.data[row.positive.counts == ncol(dup.data$counts), ]
>>
>> It makes a boolean vector for each row, then sums it, because TRUE is the same as 1, so the sum gives you how many columns are greater than zero. Then, the rows that have as many positive numbers as there are columns in the data frame are kept.
>>
>> To find unchanged genes, you might do
>>
>> unchanged <- dup.de.com$table[dup.de.com$table[, "logFC"] > -0.2 & dup.de.com$table[, "logFC"] < 0.2, ]
>>
>> replacing 0.2 with what you think the biggest fold change that unchanged genes might have.
>>
>> ---- Original message ----
>>>Date: Thu, 25 Aug 2011 11:39:03 -0300 (ART)
>>>From: bioc-sig-sequencing-bounces at r-project.org (on behalf of Estefania Mancini <estefania.mancini at indear.com>)
>>>Subject: [Bioc-sig-seq] extract non-zero rows
>>>To: bioc-sig-sequencing at r-project.org
>>>
>>>Dear all
>>>I have loaded and analyzed properly 4 454 dataset, corresponding to control and stress samples with their biological replicates.
>>>I would like to know if is possible to filter, in my DGEList  object
>>>
>>>-which tags dont have zero in any column,
>>>-which of these tags could be consider "housekeeping" (at least with logFC near 0)
>>>
>>>The object  DGEList  looks like this:
>>>
>>>>dup.data
>>>An object of class "DGEList"
>>>$samples
>>>             group lib.size norm.factors
>>>A8_control control    77953            1
>>>A8_stress   stress   176860            1
>>>mq_control control    98109            1
>>>mq_stress   stress   145839            1
>>>pi_control control   132479            1
>>>pi_stress   stress   142484            1
>>>tj_control control    65827            1
>>>tj_stress   stress   144278            1
>>>
>>>I have tried to filter using the suggested function:
>>>>dup.de.filter <- dup.data[rowSums(dup.data$counts) >= 0, ]
>>>or with
>>>>dup.de.filter <- dup.data[rowSums(dup.data$counts) >= 1, ]
>>>but have no changes at all. I have many rows which 0 and 1 read in some column which should be excluded.
>>>
>>>Also:
>>>dup.de.com
>>>An object of class "DGEExact"
>>>$table
>>>                  logConc       logFC   p.value
>>>Glyma13g11940.8 -2.588833  0.26176050 0.7348221
>>>Glyma13g11900.1 -2.875548  0.03020441 0.9688072
>>>Glyma09g24780.1 -3.501041 -0.12108619 0.8754371
>>>Glyma13g12050.1 -3.224648  0.03036675 0.9691009
>>>Glyma13g12070.1 -3.743064  0.14416487 0.8521188
>>>19860 more rows ...
>>>
>>>$comparison
>>>[1] "control" "stress"
>>>$genes
>>>NULL
>>>
>>>Thanks in advance,
>>>Estefania
>>>
>>>_______________________________________________
>>>Bioc-sig-sequencing mailing list
>>>Bioc-sig-sequencing at r-project.org
>>>https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>> --------------------------------------
>> Dario Strbenac
>> Research Assistant
>> Cancer Epigenetics
>> Garvan Institute of Medical Research
>> Darlinghurst NSW 2010
>> Australia
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>


--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia



More information about the Bioc-sig-sequencing mailing list