[BioC] [devteam-bioc] comparing seqnames
Nair, Murlidharan T
mnair at iusb.edu
Sat Jun 15 20:26:04 CEST 2013
chr1.names = seqnames(mate1[isSameCzome])[seqnames(mate1[isSameCzome]) == "chr1"]
The above worked. I guess that should give me the correct subsetting.
Thx./Murli
-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
Sent: Saturday, June 15, 2013 1:31 PM
To: guest at bioconductor.org
Cc: Maintainer; bioconductor at r-project.org; Nair, Murlidharan T
Subject: Re: [devteam-bioc] comparing seqnames
On 06/15/2013 10:17 AM, Maintainer wrote:
>
> Hi,
> I am trying to extracts mate pairs that are mapped to a particular chromosome (say chr1). The construction of my which() function is not returning what I need. May be I am missing something in constructing it.
>
> bf.data= readGappedAlignments(bam_files,
> param=ScanBamParam(what=scanBamWhat()))
when working with big data it's usually better to start your query as specifically as possible, so specify early that you're only interested in chr1.
For some bam file
bf <- BamFile("some.bam")
then
chr1gr = GRanges("chr1", IRanges(1L, seqlengths(bf)["chr1"]))
specifies the range we're interested in,
param = ScanBamParam(which=chr1gr)
sets up a ScanBamParam (scanBamWhat() does not doing anything in your code, so I've removed it) and
bf.data= readGappedAlignments(bam_files, param=param)
reads in just chr1 reads, so no need to worry down-stream about non-chr1 reads.
And a little more down below...
> mate.pairs=table(mcols(bf.data)$qname)
> onlyPairs=names(mate.pairs)[mate.pairs==2]
> mappedPairs=bf.data[mcols(bf.data)$qname %in% onlyPairs]
> mate1=mappedPairs[c(T,F)] mate2=mappedPairs[c(F,T)] isSameCzome=
> (seqnames(mate1)==seqnames(mate2))
>
> chrnames=seqnames(mate1[which(seqnames(mate1[isSameCzome])=="chr1")])
here your 'which()' indexes into mate1[isSameCzome], whereas the outer
seqnames() indexs into mate1. Thinking about the expensive (copying mate1) versus cheap (extracting / subsetting seqnames) operations, I might implement this as
chrnames = seqnames(mate1)[seqnames(mate1) == "chr1"][isSameCzome]
Hope that helps,
Martin
>
> With the last statement I am intending to extract mate1 that are mapped to chr1. So I expect chrnames to contain only chr1, but it returns all other chromosomes and not just for chr1. Is there something I am missing in the way I have written the which statement?
>
> Thanks for your help.
>
> Cheers../Murli
>
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] VariantAnnotation_1.6.6
> [2] org.Hs.eg.db_2.9.0
> [3] RSQLite_0.11.4
> [4] DBI_0.2-7
> [5] BSgenome.Hsapiens.UCSC.hg19_1.3.19
> [6] BSgenome_1.28.0
> [7] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2
> [8] GenomicFeatures_1.12.2
> [9] AnnotationDbi_1.22.6
> [10] Biobase_2.20.0
> [11] Rsamtools_1.12.3
> [12] Biostrings_2.28.0
> [13] GenomicRanges_1.12.4
> [14] IRanges_1.18.1
> [15] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.16.0 bitops_1.0-5 RCurl_1.95-4.1 rtracklayer_1.20.2
> [5] stats4_3.0.1 tools_3.0.1 XML_3.96-1.1 zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> ______________________________________________________________________
> __
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list