[BioC] [devteam-bioc] comparing seqnames

Nair, Murlidharan T mnair at iusb.edu
Sat Jun 15 20:26:04 CEST 2013


chr1.names = seqnames(mate1[isSameCzome])[seqnames(mate1[isSameCzome]) == "chr1"]
The above worked. I guess that should give me the correct subsetting. 

Thx./Murli


-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
Sent: Saturday, June 15, 2013 1:31 PM
To: guest at bioconductor.org
Cc: Maintainer; bioconductor at r-project.org; Nair, Murlidharan T
Subject: Re: [devteam-bioc] comparing seqnames

On 06/15/2013 10:17 AM, Maintainer wrote:
>
> Hi,
> I am trying to extracts mate pairs that are mapped to a particular chromosome (say chr1). The construction of my which() function is not returning what I need. May be I am missing something in constructing it.
>
> bf.data= readGappedAlignments(bam_files, 
> param=ScanBamParam(what=scanBamWhat()))

when working with big data it's usually better to start your query as specifically as possible, so specify early that you're only interested in chr1. 
For some bam file

   bf <- BamFile("some.bam")

then

   chr1gr = GRanges("chr1", IRanges(1L, seqlengths(bf)["chr1"]))

specifies the range we're interested in,

   param = ScanBamParam(which=chr1gr)

sets up a ScanBamParam (scanBamWhat() does not doing anything in your code, so I've removed it) and

   bf.data= readGappedAlignments(bam_files, param=param)

reads in just chr1 reads, so no need to worry down-stream about non-chr1 reads.

And a little more down below...


> mate.pairs=table(mcols(bf.data)$qname)
> onlyPairs=names(mate.pairs)[mate.pairs==2]
> mappedPairs=bf.data[mcols(bf.data)$qname %in% onlyPairs] 
> mate1=mappedPairs[c(T,F)] mate2=mappedPairs[c(F,T)] isSameCzome= 
> (seqnames(mate1)==seqnames(mate2))
>
> chrnames=seqnames(mate1[which(seqnames(mate1[isSameCzome])=="chr1")])

here your 'which()' indexes into mate1[isSameCzome], whereas the outer
seqnames() indexs into mate1. Thinking about the expensive (copying mate1) versus cheap (extracting / subsetting seqnames) operations, I might implement this as

     chrnames = seqnames(mate1)[seqnames(mate1) == "chr1"][isSameCzome]

Hope that helps,

Martin

>
> With the last statement I am intending to extract mate1 that are mapped to chr1. So I expect chrnames to contain only chr1, but it returns all other chromosomes and not just for chr1.  Is there something I am missing in the way I have written the which statement?
>
> Thanks for your help.
>
> Cheers../Murli
>
>
>
>   -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>   [1] VariantAnnotation_1.6.6
>   [2] org.Hs.eg.db_2.9.0
>   [3] RSQLite_0.11.4
>   [4] DBI_0.2-7
>   [5] BSgenome.Hsapiens.UCSC.hg19_1.3.19
>   [6] BSgenome_1.28.0
>   [7] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2
>   [8] GenomicFeatures_1.12.2
>   [9] AnnotationDbi_1.22.6
> [10] Biobase_2.20.0
> [11] Rsamtools_1.12.3
> [12] Biostrings_2.28.0
> [13] GenomicRanges_1.12.4
> [14] IRanges_1.18.1
> [15] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.16.0     bitops_1.0-5       RCurl_1.95-4.1     rtracklayer_1.20.2
> [5] stats4_3.0.1       tools_3.0.1        XML_3.96-1.1       zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> ______________________________________________________________________
> __
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to 
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at 
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>


--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list