[Bioc-sig-seq] [BioC] Re(surrecting): Rsamtools countBam labeling

Martin Morgan mtmorgan at fhcrc.org
Thu Mar 31 18:40:30 CEST 2011


On 03/31/2011 08:32 AM, Cook, Malcolm wrote:
> Matt et. al.,
>
> I wonder if a satisfactory resolution to the issue of "the order
> changes between the GRanges object and the countBam data.frame
>
> http://www.mail-archive.com/bioc-sig-sequencing@r-project.org/msg01144.html
>
>  I am presented with the same issue and poised to tackle it but wondr
> if a generic solution emerged from you inquiries&  efforts.

Hi Malcolm --

For a reproducible example,

   library(Rsamtools)
   example(countBam)
   which1 <- as(which, "GRanges")
   ## which2 might be where your data actually starts
   which2 <- which1[c(2,1,3)]
   values(which2)[["OriginalOrder"]] <- 1:3
   param <- ScanBamParam(which=which2)
   cnt <- countBam(fl, param=param)

What happens is that ScanBamParam converts its argument to an 
IRangesList, using split(ranges(which2), seqnames(which2)). So do the 
same for the values and unlist

   cntVals <- unlist(split(values(which2), seqnames(which2)))

then cbind coerced values

   cbind(cnt, as.data.frame(cntVals))

with

 > which2
GRanges with 3 ranges and 1 elementMetadata value
     seqnames       ranges strand | OriginalOrder
        <Rle>    <IRanges>  <Rle> |     <integer>
[1]     seq2 [ 100, 1000]      * |             1
[2]     seq1 [1000, 2000]      * |             2
[3]     seq2 [1000, 2000]      * |             3

seqlengths
  seq1 seq2
    NA   NA
 > cbind(cnt, as.data.frame(cntVals))
   space start  end width    file records nucleotides OriginalOrder
1  seq1  1000 2000  1001 ex1.bam     612       21549             2
2  seq2   100 1000   901 ex1.bam    1169       41235             1
3  seq2  1000 2000  1001 ex1.bam     642       22640             3

Martin

>
> Thanks,
>
> Malcolm Cook Stowers Institute for Medical Research -
> Bioinformatics Kansas City, Missouri  USA
>
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list