[BioC] summarizeOverlaps with mapq filter?
Valerie Obenchain
vobencha at fhcrc.org
Fri Jul 5 22:22:35 CEST 2013
Hi SangChul,
Unfortuantely ScanBamParam() does not support filtering on specific
values of the sam fields.
You can subset on the TRUE/FALSE flags, such as the QC flag,
ScanBamParam(flag=scanBamFlag(isNotPassingQualityControls=FALSE))
but it sounds like you really want to filter on the map quality. You
could write a little a helper function to filter and count the bams.
filterAndCountBam <- function(bf, features, mode, mapq_cutoff)
{
gal <- readGAlignments(bf, param=ScanBamParam(what="mapq"))
reads <- gal[mcols(gal)$mapq > mapq_cutoff]
summarizeOverlaps(features, reads, mode)
}
lapply'ing over the list of bam files returns a list of
SummarizedExperiments.
gr <- GRanges(....)
lst <- lapply(bamFileList, filterAndCountBam, features=gr,
mode=Union, mapq_cutoff=30)
Then cbind the SummarizedExperiments together to combine counts.
do.call(cbind, lst)
Valerie
On 07/03/2013 06:24 PM, Sang Chul Choi wrote:
> Hi,
>
> I wonder if I could use "summarizeOverlaps" to count reads with mapping quality (or MAPQ) larger than some value, say, 30. I took the example script in the function's help to specify what function signature I want to use. I used to use "readBamGappedAlignments" to read a BAM file, and filter out reads with MAPQ less than a specified value, and use the filtered GappedAlignments object to count reads mapped on genomic features. But, I like the feature of summarizeOverlaps that I can use to read a list of BAM files. But, I do not know use the summarizeOverlaps function to use only reads with MAPQ larger than some specified value.
>
> If this is not possible currently, then I might have to create another BAM file of reads with MAPQ larger than some value and use the BAM file to count overlaps using summarizeOverlaps function. I do not like this two-step procedure.
>
> Thank you for your help!
>
> SangChul
>
> ---------------------------------
> library(Rsamtools)
>
> fls <- list.files(system.file("extdata",package="GenomicRanges"),
> recursive=TRUE, pattern="*bam$", full=TRUE)
> names(fls) <- basename(fls)
> bf <- BamFileList(fls, index=character())
> features <- GRanges(
> seqnames = c(rep("chr2L", 4), rep("chr2R", 5), rep("chr3L", 2)),
> ranges = IRanges(c(1000, 3000, 4000, 7000, 2000, 3000, 3600,
> 4000, 7500, 5000, 5400),
> width=c(rep(500, 3), 600, 900, 500, 300, 900,
> 300, 500, 500)), "-",
> group_id=c(rep("A", 4), rep("B", 5), rep("C", 2)))
>
> se <- summarizeOverlaps(features, bf)
> ---------------------------------
>
> ===============================
> Sang Chul Choi Ph.D.
> Research Associate
>
> Sang Chul Choi (222 WRRB IAB)
> 902 Koyukuk Dr (PO Box 757000)
> 311 Irving I, University of Alaska Fairbanks
> Fairbanks, AK 99775-7000
>
> Office +1-907-474-5071
> Fax +1-907-474-6967
> Google Phone +1-607-542-9362
> ===============================
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list