[Bioc-sig-seq] [BioC] Re(surrecting): Rsamtools countBam labeling
Cook, Malcolm
MEC at stowers.org
Thu Mar 31 21:56:24 CEST 2011
!! Please pardon the line-breaks MS Outlook auto-inserted into my code in a previous version of this post.
Martin,
Hmmm, Thanks .... I think I'm getting it....
Following your lead, I can directly re-order cnt in the OriginalOrder as:
> cnt[sort(unlist(split(values(which2), seqnames(which2)))$OriginalOrder,index.return=TRUE)$ix,]
space start end width file records nucleotides
2 seq2 100 1000 901 ex1.bam 1169 41235
1 seq1 1000 2000 1001 ex1.bam 612 21549
3 seq2 1000 2000 1001 ex1.bam 642 22640
... which can usefully(?) be abstracted as
countBamWhich <- function (file,which,index=file,...) {
### wrapper for countBam that reorders its results to aggree with
### 'which', a required ScanBamParam. ... params are additional
### ScanBamParam options.
###
### ASSUMES: internal implementation detail of ScanBamParam. c.f.
### http://permalink.gmane.org/gmane.science.biology.informatics.conductor/34208)
param <- ScanBamParam(which=which,...)
values(which)[['OriginalOrder']] <- 1:length(which)
CBW = countBam(file,index,param=ScanBamParam(which=which))
CBW[sort(unlist(split(values(which),
seqnames(which)))$OriginalOrder,index.return=TRUE)$ix,]
}
allowing me to write
> countBamWhich(fl, which2)
space start end width file records nucleotides
2 seq2 100 1000 901 ex1.bam 1169 41235
1 seq1 1000 2000 1001 ex1.bam 612 21549
3 seq2 1000 2000 1001 ex1.bam 642 22640
All in favor?
~ Malcolm
Malcolm Cook
Stowers Institute for Medical Research - Bioinformatics
Kansas City, Missouri USA
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Thursday, March 31, 2011 11:41 AM
> To: Cook, Malcolm
> Cc: 'myoung at wehi.EDU.AU'; 'bioconductor at r-project.org';
> 'Bioc-sig-sequencing at r-project.org'
> Subject: Re: [BioC] Re(surrecting): [Bioc-sig-seq] Rsamtools
> countBam labeling
>
> On 03/31/2011 08:32 AM, Cook, Malcolm wrote:
> > Matt et. al.,
> >
> > I wonder if a satisfactory resolution to the issue of "the order
> > changes between the GRanges object and the countBam data.frame
> >
> >
> http://www.mail-archive.com/bioc-sig-sequencing@r-project.org/
> msg01144.html
> >
> > I am presented with the same issue and poised to tackle it
> but wondr
> > if a generic solution emerged from you inquiries& efforts.
>
> Hi Malcolm --
>
> For a reproducible example,
>
> library(Rsamtools)
> example(countBam)
> which1 <- as(which, "GRanges")
> ## which2 might be where your data actually starts
> which2 <- which1[c(2,1,3)]
> values(which2)[["OriginalOrder"]] <- 1:3
> param <- ScanBamParam(which=which2)
> cnt <- countBam(fl, param=param)
>
> What happens is that ScanBamParam converts its argument to an
> IRangesList, using split(ranges(which2), seqnames(which2)). So do the
> same for the values and unlist
>
> cntVals <- unlist(split(values(which2), seqnames(which2)))
>
> then cbind coerced values
>
> cbind(cnt, as.data.frame(cntVals))
>
> with
>
> > which2
> GRanges with 3 ranges and 1 elementMetadata value
> seqnames ranges strand | OriginalOrder
> <Rle> <IRanges> <Rle> | <integer>
> [1] seq2 [ 100, 1000] * | 1
> [2] seq1 [1000, 2000] * | 2
> [3] seq2 [1000, 2000] * | 3
>
> seqlengths
> seq1 seq2
> NA NA
> > cbind(cnt, as.data.frame(cntVals))
> space start end width file records nucleotides OriginalOrder
> 1 seq1 1000 2000 1001 ex1.bam 612 21549 2
> 2 seq2 100 1000 901 ex1.bam 1169 41235 1
> 3 seq2 1000 2000 1001 ex1.bam 642 22640 3
>
> Martin
>
> >
> > Thanks,
> >
> > Malcolm Cook Stowers Institute for Medical Research -
> > Bioinformatics Kansas City, Missouri USA
> >
> >
> > _______________________________________________ Bioconductor mailing
> > list Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> > archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
More information about the Bioc-sig-sequencing
mailing list