[BioC] genefilter kOverA filter by range
James W. MacDonald
jmacdon at uw.edu
Thu Jan 23 23:50:13 CET 2014
Hi Kristina,
This part:
filtest<-rowSums(abs(otu_table(comp))>3)>1
should actually be
filtest<-rowSums(abs(otu_table(comp))>3)>0
returns a logical vector, which you would then use as input to
prune_taxa(), not filter_taxa(). I am surprised it worked for you, as
internally filter_taxa is doing something like
apply(OTU, 1, flist)
where flist is supposed to be a function, not a logical vector.
Alternatively you could just create a kOverA() version that does what
you want:
kOverAbsA <- function(k, A, na.rm = TRUE{
function(x) {
if(na.rm) x <- x[!is.na(x)]
sum(abs(x) > A) >= k
}
}
and then do what you tried before
filtered <- filter_taxa(comp, filterfun(kOverAbsA(1, 3)), TRUE)
Best,
Jim
On 1/23/2014 5:15 PM, Kristina M Fontanez wrote:
> Hi James,
>
> Unfortunately, your proposed solution didn’t work for me. I think it’s
> because I am working with objects built with the phyloseq package
> which I am trying to subsequently filter with the genefilter functions.
>
> First, a review of the two phyloseq objects from my last post:
> > comp
> phyloseq-class experiment-level object
> otu_table() OTU Table: [ 2151 taxa and 3 samples ]
> tax_table() Taxonomy Table: [ 2151 taxa by 6 taxonomic ranks ]
> > LFC3
> phyloseq-class experiment-level object
> otu_table() OTU Table: [ 164 taxa and 3 samples ]
> tax_table() Taxonomy Table: [ 164 taxa by 6 taxonomic ranks ]
>
> Now, your solution:
> > filtest<-rowSums(abs(otu_table(comp))>3)>1
> > filtest[1:5]
> OTU1206 OTU1203 OTU1297 OTU1338 OTU1144
> TRUE TRUE TRUE TRUE TRUE
> > LFC3true=filter_taxa(comp,filtest,TRUE)
> > LFC3true
> phyloseq-class experiment-level object
> otu_table() OTU Table: [ 66 taxa and 3 samples ]
> tax_table() Taxonomy Table: [ 66 taxa by 6 taxonomic ranks ]
> > min(otu_table(comp))
> [1] -7.4
> > min(otu_table(LFC3true))
> [1] -5.5
>
> BUT, as you can see the new LFC3true object is still missing that -7.4
> value and now it contains even less taxa than the LFC3 object. If the
> filter works correctly, I should be getting MORE taxa added to that
> object.
>
> I also tried your solution verbatim but ran into trouble because my
> phyloseq object can’t be subset in the way you suggested:
> > newcomp=comp[filtest,]
> Error in comp[filtest, ] : object of type 'S4' is not subsettable
>
> I believe that in order to use the filter_taxa function to subset the
> phyloseq object, I need a genefilter list object. Pasted below is the
> information in the phyloseq manual from the filter_taxa object.
>
> filter_taxa Filter taxa based on across-sample OTU abundance criteria
>
> Description
>
> This function is directly analogous to the genefilter function for
> microarray filtering, but is used for filtering OTUs from phyloseq
> objects. It applies an arbitrary set of functions — as a function
> list, for instance, created by filterfun — as across-sample criteria,
> one OTU at a time. It takes as input a phyloseq object, and returns a
> logical vector indicating whether or not each OTU passed the criteria.
> Alternatively, if the "prune" option is set to FALSE, it returns the
> already-trimmed version of the phyloseq object.
>
> Usage
>
> Arguments
>
> physeq (Required). A phyloseq-class object that you want to trim/filter.
>
> flist (Required). A function or list of functions that take a vector
> of abundance values and return a logical. Some canned useful function
> types are included in the genefilter-package.
>
> prune (Optional). A logical. Default FALSE. If TRUE, then the function
> returns the pruned phyloseq-class object, rather than the logical
> vector of taxa that passed the filter.
>
> Value
>
> A logical vector equal to the number of taxa in physeq. This can be
> provided directly to prune_taxa as first argument. Alternatively, if
> prune==TRUE, the pruned phyloseq-class object is returned instead.
>
>
> Thanks,
> Kristina
> ------------------------------------------------------------------
> Kristina Fontanez, Postdoctoral Fellow
> fontanez at mit.edu <mailto:fontanez at mit.edu>
> Massachusetts Institute of Technology
> Department of Civil and Environmental Engineering
> 48-120E
> 15 Vassar Street
> Cambridge, MA 02139
>
>
>
> On Jan 23, 2014, at 4:38 PM, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
>> Hi Kristina,
>>
>> On 1/23/2014 4:25 PM, Kristina M Fontanez wrote:
>>> Dear Bioconductors,
>>>
>>> I am trying to use the genefilter package to filter a set of
>>> Log2fold changes so that I can keep those taxa with Log2fold changes
>>> > 3. However, the data itself consists of both positive and negative
>>> values, as is the case with log 2 fold comparisons.
>>
>> You don't need the genefilter package to do this, and in fact
>> genefilter is intended for a completely different task.
>>
>> Instead you just need to use simple R commands.
>>
>> filt <- rowSums(abs(comp) > 3) > 1
>> comp[filt,]
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>>
>>> Example data:
>>> OTU Table: [5 taxa and 3 samples]
>>> taxa are rows
>>> LvS DvS LvD
>>> OTU1206 10.3 1.3 9.0
>>> OTU1203 8.3 2.7 5.5
>>> OTU1297 6.8 -0.9 7.7
>>> OTU1338 6.2 -1.4 7.7
>>> OTU1144 7.4 1.6 5.8
>>>
>>> I want to create a filter so that the OTUs with Log2 fold changes >
>>> magnitude 3 in either the positive or negative direction are kept.
>>> However, the documentation for kOverA in the genefilter package
>>> implies that you can only input “values you want to exceed”. As the
>>> code below is currently written, I am only keeping taxa with a log2
>>> fold change > +3 in any one sample. However, taxa with a log2 fold
>>> change of -7 in a particular sample would be left out. I tested
>>> whether I was missing any OTUs by looking for the minimum value in
>>> the original OTU table (comp) and in the filtered OTU table (LFC3).
>>> As you can see the minimum -7.4 log2 fold change value in comp does
>>> not exist in the LFC3 object so it was excluded by my flist2 filter.
>>>
>>> Is there a similar function like kOverA that I can use to get large
>>> magnitude changes in both the positive and negative directions?
>>>
>>> I tried the code:
>>>> comp
>>> phyloseq-class experiment-level object
>>> otu_table() OTU Table: [ 2151 taxa and 3 samples ]
>>> tax_table() Taxonomy Table: [ 2151 taxa by 6 taxonomic ranks ]
>>>
>>>> flist2<-filterfun(kOverA(1,3.0))
>>>> LFC3=filter_taxa(comp,flist2,TRUE)
>>>> LFC3
>>> phyloseq-class experiment-level object
>>> otu_table() OTU Table: [ 164 taxa and 3 samples ]
>>> tax_table() Taxonomy Table: [ 164 taxa by 6 taxonomic ranks ]
>>>> min(otu_table(comp))
>>> [1] -7.4
>>>> min(otu_table(LFC3))
>>> [1] -5.5
>>>
>>> Thank you,
>>> Kristina
>>>
>>>> sessionInfo()
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] genefilter_1.44.0 ggplot2_0.9.3.1 scales_0.2.3
>>> phyloseq_1.7.12
>>>
>>> loaded via a namespace (and not attached):
>>> [1] ade4_1.6-2 annotate_1.40.0 AnnotationDbi_1.24.0
>>> [4] ape_3.0-11 Biobase_2.22.0 BiocGenerics_0.8.0
>>> [7] biom_0.3.11 Biostrings_2.30.1 cluster_1.14.4
>>> [10] codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7
>>> [13] DESeq2_1.2.8 dichromat_2.0-0 digest_0.6.4
>>> [16] foreach_1.4.1 GenomicRanges_1.14.4 grid_3.0.2
>>> [19] gtable_0.1.2 igraph_0.6.6 IRanges_1.20.6
>>> [22] iterators_1.0.6 labeling_0.2 lattice_0.20-24
>>> [25] locfit_1.5-9.1 MASS_7.3-29 Matrix_1.1-1.1
>>> [28] multtest_2.18.0 munsell_0.4.2 nlme_3.1-113
>>> [31] parallel_3.0.2 permute_0.8-0 plyr_1.8
>>> [34] proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.10.6
>>> [37] RcppArmadillo_0.4.000 reshape2_1.2.2 RJSONIO_1.0-3
>>> [40] RSQLite_0.11.4 splines_3.0.2 stats4_3.0.2
>>> [43] stringr_0.6.2 survival_2.37-4 tools_3.0.2
>>> [46] vegan_2.0-10 XML_3.95-0.2 xtable_1.7-1
>>> [49] XVector_0.2.0
>>>
>>> ------------------------------------------------------------------
>>> Kristina Fontanez, Postdoctoral Fellow
>>> fontanez at mit.edu <mailto:fontanez at mit.edu><mailto:fontanez at mit.edu>
>>> Massachusetts Institute of Technology
>>> Department of Civil and Environmental Engineering
>>> 48-120E
>>> 15 Vassar Street
>>> Cambridge, MA 02139
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the
>>> archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list