[BioC] problem in reading bam files with EDASeq
davide risso
risso.davide at gmail.com
Thu Jan 26 19:48:22 CET 2012
Hi Raffaele,
thank you for your message.
Sorry for the late response but I've been traveling.
The lines
> quality <- as(fq[strand=="+"], "matrix")
> quality <- rbind(quality,as(fq[strand=="-"], "matrix")[,NCOL(quality):1])
are needed since SAM/BAM store reads which map onto the reverse strand
in reverse complement and hence the quality is reversed.
Of course, if you have unmapped reads the strand will not be present,
so I changed the function in the devel version of the package to fix
this.
Cheers,
davide
On Wed, Jan 11, 2012 at 3:55 AM, rcaloger <raffaele.calogero at gmail.com> wrote:
> Hi,
> I an trying to use EDASeq for Read-level data.
> My bam file was generated by bowtie mapping of reads against human hg19
> chr22, sam were then converted in bam using picard tool and indexing was
> done with Rsamtools.
> If I try to using plotQuality:
>> bfs
> BamFileList of length 4
>> bfs[[1]]
> class: BamFile
> path:
> H:/data/calogero/bioC-devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/libs.../breast1_R1.fastq-breast1_R2.fastq.hs_chr22.bam
> index:
> H:/data/calogero/bioC-devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/lib.../breast1_R1.fastq-breast1_R2.fastq.hs_chr22.bam
> isOpen: FALSE
>> plotQuality(bfs[[1]])
> Errore in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) :
> subscript contains NAs
>>
>
> I investigated the problem and I found out that the problems comes out
> at the line of setMethod plotQuality
> quality <- as(fq[strand=="+"], "matrix")
> It seems that the problem is related to the presence of three types of codes
> in mapped strands: +, - and <NA>
> unique(strand)
> [1] + - <NA>
> Levels: + - *
>> length(strand)
> [1] 20044478
>> length(strand[which(strand=="-")])
> [1] 143352
>> length(strand[which(strand=="+")])
> [1] 143352
>> length(strand[is.na(strand)])
> [1] 19757774
>>
> The 19757774 sequences that provide the error are those unmapped where it is
> not possible to assign a strand
>
> #If I remove unmapped quality scores:
>> fq <- fq[!is.na(strand)]
> #and if I remove <NA> from stran
>> strand <- strand[!is.na(strand)]
> #The problem is solved
>> quality <- as(fq[strand=="+"], "matrix")
>>str(quality)
> int [1:143352, 1:50] 70 70 70 68 70 70 51 70 70 70
>
> So I tried:
> plotQuality(bfs[[1]], flag=scanBamFlag(isUnmappedQuery=FALSE))
> but I still get:
> Error in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) :
> subscript contains NAs
>
>
> Any suggestion?
>
> Many thanks
> Raffaele
>
> --
>
> ----------------------------------------
> Prof. Raffaele A. Calogero
> Bioinformatics and Genomics Unit
> MBC Centro di Biotecnologie Molecolari
> Via Nizza 52, Torino 10126
> tel. ++39 0116706457
> Fax ++39 0112366457
> Mobile ++39 3333827080
> email: raffaele.calogero at unito.it
> raffaele[dot]calogero[at]gmail[dot]com
> www: http://www.bioinformatica.unito.it
>
>
--
---------------------------------------------
Davide Risso
Department of Statistical Sciences
University of Padova, Italy
e-mail: davide at stat.unipd.it
More information about the Bioconductor
mailing list