[BioC] pulling out specific lines from a bam file
Martin Morgan
mtmorgan at fhcrc.org
Sun Jun 30 05:28:58 CEST 2013
On 06/29/2013 05:29 PM, Eric Foss [guest] wrote:
>
> I would like to enter a list of SNPs and a bam file and have Bioconductor return all the lines from the bam file, in human readable form, that contain the SNP in question. Here is my attempt:
>
> [code]library(GenomicRanges)
>
> genes <- GRanges(seqnames = c("13", "1", "12"),
> ranges = IRanges(
> start = c(111942495, 114516752, 116434901),
> width = 1),
> strand = rep("*", 3),
> seqlengths = c("1" = 249250621, "13" = 115169878, "12" = 1
33851895))
>
> alnFile <- "first_10k_PASRTP_RNA_790790_exome6500SNPtolerant_062913_1_clean_sorted.bam"
>
> aln <- readGappedAlignments(alnFile)
here you can specify what you'd like to read in, along the lines of
param = ScanBamParam(what="seq")
readGappedAlignments(alnFile, param=param)
see scanBamWhat() for other things one might read in.
Martin
>
> bamGRanges <- GRanges(seqnames = seqnames(aln),
> ranges = ranges(aln),
> strand = strand(aln),
> seqlengths = seqlengths(aln))
>
> answers <- findOverlaps(query = genes, subject = bamGRanges)
>
> final <- aln[subjectHits(answers)]
> [/code]
>
> my "final" variable consists of almost what I want, but not quite. It has only some of the fields from the bam file. It doesn't, for example, have the sequence. Can someone point me in the right direction?
>
> Thank you.
>
> Eric
>
>
> -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Rsamtools_1.12.3 Biostrings_2.28.0 GenomicRanges_1.12.4 IRanges_1.18.1
> [5] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-5 stats4_3.0.1 tools_3.0.1 zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list