[Bioc-sig-seq] Fwd: "identical" on PhredQuality objects - I'm confused

Janet Young jayoung at fhcrc.org
Wed Jun 15 04:06:58 CEST 2011


maybe I should have included sessionInfo() output:

R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rtracklayer_1.12.4  RCurl_1.6-5         bitops_1.0-4.1     
[4] ShortRead_1.10.4    Rsamtools_1.4.2     lattice_0.19-26    
[7] Biostrings_2.20.1   GenomicRanges_1.4.6 IRanges_1.10.4     

loaded via a namespace (and not attached):
[1] Biobase_2.12.1  BSgenome_1.20.0 grid_2.13.0     hwriter_1.3    
[5] XML_3.4-0      


Begin forwarded message:

> From: Janet Young <jayoung at fhcrc.org>
> Date: June 14, 2011 6:58:01 PM PDT
> To: Bioc-sig-sequencing at r-project.org
> Subject: "identical" on PhredQuality objects - I'm confused
> 
> Hi there,
> 
> I'm investigating a mistake I made when running BWA (a little embarrassing, but it happens).  I had initially failed to use the -I flag on BWA to tell it to convert our Illumina ASCII-64 qualities appropriately.  Now I'm taking a look to see how much differnce it makes in our downstream analysis.
> 
> Anyway, that's not why I'm writing - while delving into this I've found something unexpected about the "identical" function, and I'd love to understand that better if possible.
> 
> I used scanBam to read in my two BWA output files - with and without the "-I" option when I ran BWA (same input seqs, everything else the same). As I expected, with the -I option BWA converted the qualities (my lane1read2_bwaWithI_bamscan object), and in the other it didn't (my lane1read2_bwaNotI_bamscan object).  That makes sense to me - here's how the quals look after reading BWA output in with scanBam.
> 
> ### when I use the -I option with BWA:
>> head(lane1read2_bwaWithI_bamscan[[1]][["qual"]])
>  A PhredQuality instance of length 6
>    width seq
> [1]    50 HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHFHFF
> [2]    36 GFFEEHHFHHHHHHHEFEEEHHHHHFHHHEEFEE at E
> [3]    50 ##################################################
> [4]    31 HHHHHHHHHHHHHHHHHHHHHHEHHHHFHFH
> [5]    24 HHHHHHHHHHHHHHDFHHHHHHHH
> [6]    50 FFFFCD=8BGEBHGHHHHEHGFGHHDHEHFHEHHHHGHHHHHHHHHGHHH
> 
> ### when I don't use the -I option with BWA:
>> head(lane1read2_bwaNotI_bamscan[[1]][["qual"]]) 
>  A PhredQuality instance of length 6
>    width seq
> [1]    50 gggggggggggggggggggggggggggggggggggggggggeggggegee
> [2]    36 feeddggegggggggdedddgggggegggddedd_d
> [3]    50 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> [4]    31 ggggggggggggggggggggggdggggegeg
> [5]    24 ggggggggggggggcegggggggg
> [6]    50 eeeebc\Wafdagfggggdgfefggcgdgegdggggfgggggggggfggg
> 
> 
> The part I don't understand is when I use identical to test whether those two qual objects are the same, it says they are identical:
> 
>> identical( lane1read2_bwaWithI_bamscan[[1]][["qual"]], lane1read2_bwaNotI_bamscan[[1]][["qual"]]) 
> [1] TRUE
> 
> In one sense they're the same (if you convert from Illumina qualities to normal Phred qualities), but how did identical know to do that?  I might be misunderstanding identical - I thought that EVERYTHING about the object had to be identical.  
> 
> I want to understand this as it seems like a dangerous misunderstanding - I use identical a fair amount to make sure I'm not screwing things up.
> 
> Really I should probably have converted the "qual" item in my scamBam-generated list to a SolexaQuality object to keep track of the fact they're offset by 64 and not 33, but that's OK. At this point I just want to take a look at what's changed about the BWA output when I use that -I flag.
> 
> Any thoughts?
> 
> thanks,
> 
> Janet
> 
> 
> 
> 
> 
> 
> 



More information about the Bioc-sig-sequencing mailing list