[Bioc-sig-seq] Fwd: Re: Question about sbanBam() with AB SOLID data

James MacDonald jmacdon at med.umich.edu
Fri Nov 12 01:38:49 CET 2010


I think it's your BAM file:

> p0 <- ScanBamParam(tag=c("CS"))
> tmp <- scanBam("tmp.bam","tmp", param=p0)
> head(tmp[[1]]$tag$CS)
  [1] "T21321132310232111201223321100233030101122113002232"
  [2] "T20311221101031332001123322101111232013131123112303"
  [3] "T12232200311221101000032001123222100002213223130031"
  [4] "T22200311221101030332001123322102111232013231123112"
  [5] "T12132000331010031313100222322003102211110303320201"

> table(nchar(tmp[[1]]$tag$CS))

 51 
950 

This is with Bfast aligned data. Now for Bioscope data:

> head(tmp2[[1]]$tag$CS)
[1] "T10013200103230133201030001032001032000032001032000"
[2] "T03001002300100230100230000230000230000030100000000"
[3] "T03001002300100230100230000230000230000300100000000"
[4] "T20023010002301002301002301002301002233301310022123"
[5] "T31033012001032001032001032001032000300000010000000"
[6] "T11310022123030300332233003001221210212222220221113"

> table(nchar(tmp2[[1]]$tag$CS))

 51 
973 

Note that with Bioscope's seed and extend model, not all of the nucleotide space reads will be full-length, but from this small example, it appears that the color space reads are full length.

Best,

Jim




James W. MacDonald, M.S.
Biostatistician
Douglas Lab
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
>>> Martin Morgan  11/11/10 6:06 PM >>>
On 11/11/2010 02:25 PM, ivan.borozan at utoronto.ca wrote:
> Bioscope pipeline was used to align the data. In order to get reads in
> color space and their quality
> 
> param<- ScanBamParam(tag=c("CQ"))
> 
> and
> 
> param<- ScanBamParam(tag=c("CS"))
> 
> 
> should be specified.

Hi Ivan -- I'm a little confused about where this leaves you. Can you
actually read the sequences / quality strings in to R? Are they
represented in sequence space? Is there a BAM file available somewhere
to experiment with?

Martin

> 
> Best,
> 
> Ivan
> 
> Quoting James MacDonald :
> 
>> What aligner are you using that returns the sequences in color  
>> space? The SAM format specifies that:
>>
>> "Color alignments are stored as normal nucleotide alignments with  
>> additional tags describing the raw color sequences, ..."
>>
>> So in general I wouldn't expect the seq to be color space, but  
>> nucleotide space. Depending on the aligner, you may get a CS:Z: tag 
>>  of color space sequence, but I don't believe scanBam will parse that.
>>
>> Best,
>>
>> Jim
>> -- 
>>
>> James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> University of Michigan
>> Department of Human Genetics
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>>
>>
>>>>>  wrote:
>>> Hello list,
>>>
>>> Can scanBam() be used with AB SOLID data (bam files) so that it can
>>> return sequences in color space and with the right lengths?
>>>
>>> My read sequences are 50 bp in lengths however scanBam() is returning
>>> sequences of length between 25 - 27 (they seem to be clipped) and
>>> which are not in color space.
>>>
>>> Many thanks for any suggestions,
>>>
>>> Ivan
>>>
>>> _______________________________________________
>>> Bioc-sig-sequencing mailing list
>>> Bioc-sig-sequencing at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>> **********************************************************
>> Electronic Mail is not secure, may not be read every day, and should  
>> not be used for urgent or sensitive issues
>>
>>
> 
> 
> 
> 
> 
> 
> 
> ----- End forwarded message -----
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
Bioc-sig-sequencing at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioc-sig-sequencing mailing list