[BioC] problem in reading bam files with EDASeq
Raffaele A Calogero
raffaele.calogero at gmail.com
Fri Jan 20 23:43:02 CET 2012
Many thanks for the nice comment I will definetively try this functionality of Rsamtools.
Concerning the problem I had with EDASeq, I solved Filtering with Rsamtools the bam file: removing all unmapped and unpaired PE reads. This bam filtered file works perfectly on EDASeq
cheers
Raffaele
Sent from iPad
Il giorno 20/gen/2012, alle ore 21:40, Valerie Obenchain <vobencha at fhcrc.org> ha scritto:
> Hi Raffaele,
>
> It looks like you've identified a bug in EDASeq and the author is cc'd.
>
> In the meantime, if you're after an assessment of the read quality have you tried using the qa report in ShortRead?
>
> fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
> qaSummary <- qa(fl, type="BAM")
> report(qaSummary, dest="/path/to/report_directory")
>
> This report can then be viewed in your browser.
>
>
> Valerie
>
>
> On 01/11/12 03:55, rcaloger wrote:
>> Hi,
>> I an trying to use EDASeq for Read-level data.
>> My bam file was generated by bowtie mapping of reads against human hg19 chr22, sam were then converted in bam using picard tool and indexing was done with Rsamtools.
>> If I try to using plotQuality:
>> > bfs
>> BamFileList of length 4
>> > bfs[[1]]
>> class: BamFile
>> path: H:/data/calogero/bioC-devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/libs.../breast1_R1.fastq-breast1_R2.fastq.hs_chr22.bam
>> index: H:/data/calogero/bioC-devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/lib.../breast1_R1.fastq-breast1_R2.fastq.hs_chr22.bam
>> isOpen: FALSE
>> > plotQuality(bfs[[1]])
>> Errore in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) :
>> subscript contains NAs
>> >
>>
>> I investigated the problem and I found out that the problems comes out
>> at the line of setMethod plotQuality
>> quality <- as(fq[strand=="+"], "matrix")
>> It seems that the problem is related to the presence of three types of codes in mapped strands: +, - and <NA>
>> unique(strand)
>> [1] + - <NA>
>> Levels: + - *
>> > length(strand)
>> [1] 20044478
>> > length(strand[which(strand=="-")])
>> [1] 143352
>> > length(strand[which(strand=="+")])
>> [1] 143352
>> > length(strand[is.na(strand)])
>> [1] 19757774
>> >
>> The 19757774 sequences that provide the error are those unmapped where it is not possible to assign a strand
>>
>> #If I remove unmapped quality scores:
>> > fq <- fq[!is.na(strand)]
>> #and if I remove <NA> from stran
>> > strand <- strand[!is.na(strand)]
>> #The problem is solved
>> > quality <- as(fq[strand=="+"], "matrix")
>> >str(quality)
>> int [1:143352, 1:50] 70 70 70 68 70 70 51 70 70 70
>>
>> So I tried:
>> plotQuality(bfs[[1]], flag=scanBamFlag(isUnmappedQuery=FALSE))
>> but I still get:
>> Error in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) :
>> subscript contains NAs
>>
>>
>> Any suggestion?
>>
>> Many thanks
>> Raffaele
>>
>
More information about the Bioconductor
mailing list