[Bioc-sig-seq] Rsamtools append

Martin Morgan mtmorgan at fhcrc.org
Sat Apr 3 01:53:19 CEST 2010


Hi Burak --

On 04/02/2010 04:17 PM, burak kutlu wrote:
> Hi 
> Following the example given in this thread:
> http://www.mail-archive.com/bioc-sig-sequencing@r-project.org/msg00805.html
> I am trying to "append" the scanned bam files, but somehow I can't.
> The code runs but I only have reads from the first chromosome
> Any ideas?
> Many thanks in advance
> -burak
> 
> Here's what I am trying to do:
> 
> my.bam.file <- paste(barcodeDir, "/", bfFiles[2], "/output/mapping/solid0444_20100121_bcSample1_F3.bam",sep="")
> lens <- org.Mm.egCHRLENGTHS
> lens <- lens[c(1:19,"X","Y")]
> i=2
> which <- RangesList(IRanges(start=1,end=lens[i]))
> names(which) <- paste("chr",names(lens)[i],sep="")
> params <- ScanBamParam(which=which)
> myReads <<- scanBam(my.bam.file,param=params)
> for(i in c(2:length(lens))){
>       which <- RangesList(IRanges(start=1,end=lens[i]))
>       names(which) <- paste("chr",names(lens)[i],sep="")
>       params <- ScanBamParam(which=which)
>       myReads <<- append(myReads,scanBam(my.bam.file,param=params)[[1]])
> }#i

after running example(scanBam), I do

> x0 = scanBam(fl, param=p2)[[1]]
> str(x0)
List of 4
 $ rname : Factor w/ 2 levels "seq1","seq2": 1 1 1 1 1 1 1 1 1 1 ...
 $ strand: Factor w/ 3 levels "+","-","*": 1 1 1 1 1 1 1 1 2 1 ...
 $ pos   : int [1:3307] 1 3 5 6 9 13 13 15 18 22 ...
 $ qwidth: int [1:3307] 36 35 35 36 35 35 36 35 35 35 ...

so x0 is an ordinary R 'list' and

> x1 = append(x0, scanBam(fl, param=p2)[[1]])
> str(x1)
List of 8
 $ rname : Factor w/ 2 levels "seq1","seq2": 1 1 1 1 1 1 1 1 1 1 ...
 $ strand: Factor w/ 3 levels "+","-","*": 1 1 1 1 1 1 1 1 2 1 ...
 $ pos   : int [1:3307] 1 3 5 6 9 13 13 15 18 22 ...
 $ qwidth: int [1:3307] 36 35 35 36 35 35 36 35 35 35 ...
 $ rname : Factor w/ 2 levels "seq1","seq2": 1 1 1 1 1 1 1 1 1 1 ...
 $ strand: Factor w/ 3 levels "+","-","*": 1 1 1 1 1 1 1 1 2 1 ...
 $ pos   : int [1:3307] 1 3 5 6 9 13 13 15 18 22 ...
 $ qwidth: int [1:3307] 36 35 35 36 35 35 36 35 35 35 ...

append(x0, scanBam(<...>)[[1]]) appends two lists according to R's
standard rules. Likely you are trying x1[["rname"]], and again by R's
standard rules you get the first rname; the others are there, just not
selected. You might x1["rname"]] to get them all, or do.call(c,
x1["rname"]) to concetenate the elments of x1, but...

If memory is a problem then the first thing to do is to read only those
parts of the BAM file that will be useful in downstream analysis. Do
this by using the what= parameter of ScanBamParam; see also ?scanBamWhat.

If you're still needing to read in portions of a BAM file, then probably
you want to be thinking along the lines of

summaryFunction = function(chr, ...) {
   ## set up param=ScanBamParam();
   ##   set what= to those scanBamWhat() that are required
   ## read subset into object x, x = scanBam(..., param=param)
   ## summarize x e.g., with(x, coverage(IRanges(pos, width=qwidth)))
}

and then doing this for each chromosome or other chunk along the lines of

  lapply(seqnames, summaryFunction, <other arguments>)

The above is psuedo-code; if you provide more detailed objectives then
it could be fleshed out more. In the thread you quote Karl is append'ing
two AlignedRead objects, and these are append'ed as described on class ?
AlignedRead.

Martin


> 
> 
> 
> 
> R version 2.11.0 alpha (2010-03-29 r51485) 
> x86_64-unknown-linux-gnu 
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages:
> [1] grid      stats     graphics  grDevices datasets  utils     methods  
> [8] base     
> 
> other attached packages:
>  [1] rtracklayer_1.7.11                      
>  [2] RCurl_1.3-1                             
>  [3] bitops_1.0-4.1                          
>  [4] Rsamtools_0.2.0                         
>  [5] ggplot2_0.8.7                           
>  [6] digest_0.4.2                            
>  [7] reshape_0.8.3                           
>  [8] plyr_0.1.9                              
>  [9] proto_0.3-8                             
> [10] org.Mm.eg.db_2.4.0                      
> [11] RSQLite_0.8-4                           
> [12] DBI_0.2-5                               
> [13] AnnotationDbi_1.9.6                     
> [14] Biobase_2.7.5                           
> [15] ShortRead_1.5.21                        
> [16] lattice_0.18-3                          
> [17] BSgenome.Mmusculus.UCSC.mm9_1.3.16      
> [18] BSgenome_1.15.20                        
> [19] GenomicFeatures.Mmusculus.UCSC.mm9_0.1.2
> [20] GenomicFeatures_0.5.1                   
> [21] GenomicRanges_0.1.0                     
> [22] Biostrings_2.15.26                      
> [23] IRanges_1.5.73                          
> 
> loaded via a namespace (and not attached):
> [1] biomaRt_2.3.5 hwriter_1.2   tools_2.11.0  XML_2.8-1
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list