[BioC] filter low expression tags
Kasper Daniel Hansen
kasperdanielhansen at gmail.com
Thu Nov 29 05:28:46 CET 2012
You keep the genes where at least 2 samples have a cpm greater than 100.
rowSums(cpm(d) >100)
counts, for each gene (row), how many samples have a cpm >= 100.
Kasper
On Wed, Nov 28, 2012 at 10:54 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Wed, Nov 28, 2012 at 10:14 PM, Vittoria Roncalli <roncalli at hawaii.edu> wrote:
>> Hi,
>>
>> I would like to understand how the filter of low expression tags works. If
>> I run the command
>>
>>>keep <- rowSums (cpm(d)>100) >=2
>> d <- d[keep,]
>> dim(d)
>>
>> as in the use guide page 32, this means that I am using a cutoff of 100cpm,
>> but how are treated the 2 samples? Did are they averaged and then the low
>> tags are removed?
>> Is each sample considered separate and filtered by itself?
>> Thanks foe the help in advance
>
> How many samples (columns) do you have?
>
> You should first look at the output of `cpm(d) > 100` to see what you
> are getting -- this will be a logical (boolean) matrix that has the
> same dimensionality as `cpm(d)`.
>
> rowSums( a logical matrix )
>
> returns a vector that is as long as there are rows in the logical
> matrix, and each value indicates how many columns are TRUE in that
> row.
>
> HTH,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list