[Bioc-sig-seq] readFastq() error

Martin Morgan mtmorgan at fhcrc.org
Thu Mar 24 18:33:24 CET 2011


On 03/24/2011 10:01 AM, joseph wrote:

> @GAII_0001:6:91:210:160#0/1
> CTCGCGAAGCTTCTCTGGAGGAGAGTGATGTACGATGNCN
> +GAII_0001:6:91:210:160#0/1
> a__a_a__a_ba]abbabXa__a_BBBBBBBBBBBBBB
> boyce-162-119:mRNA_monocyte jdhahbi$

As you can see, the last read has two quality scores less than the 
number of nucleotides. This has been introduced somewhere in your 
upstream processing path.

Martin
>
>
>
> ------------------------------------------------------------------------
> *From:* Martin Morgan <mtmorgan at fhcrc.org>
> *To:* joseph <jdsandjd at yahoo.com>
> *Cc:* bioc-sig-sequencing at r-project.org
> *Sent:* Thu, March 24, 2011 9:39:42 AM
> *Subject:* Re: [Bioc-sig-seq] readFastq() error
>
> On 03/24/2011 09:41 AM, joseph wrote:
>  > I added a new line character at the end of the file
>  > echo >> reads.fq
>  > I got the same numbers when I repeated the analysis
>
> you indicated that there were 16509910 reads in the file, and the test
> indicates its the last read that causes problems, so what does the last
> read look like? e.g., tail reads.fq
>
> Martin
>  >
>  >
>  > ------------------------------------------------------------------------
>  > *From:* Martin Morgan <mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>
>  > *To:* joseph <jdsandjd at yahoo.com <mailto:jdsandjd at yahoo.com>>
>  > *Cc:* bioc-sig-sequencing at r-project.org
> <mailto:bioc-sig-sequencing at r-project.org>
>  > *Sent:* Wed, March 23, 2011 7:44:40 PM
>  > *Subject:* Re: [Bioc-sig-seq] readFastq() error
>  >
>  > On 03/23/2011 05:49 PM, joseph wrote:
>  > > Hi Martin
>  > > here is what I got:
>  > > x = readLines('~/myDir/reads.fq')
>  > > rd = x[c(FALSE, TRUE, FALSE, FALSE)]
>  > > qual = x[c(FALSE, FALSE, FALSE, TRUE)]
>  > > > which(nchar(rd) != nchar(qual))
>  > > [1] 16509910
>  > > # that is all the reads in the file
>  > > # When I tried to count the reads with the same number of characters, I
>  > > also got all the reads
>  > > > length(which(nchar(rd) == nchar(qual)))
>  > > [1] 16509909
>  >
>  > I suspect there is a missing end-of-line on the last line of the file.
>  > >
>  > > Joseph
>  > >
>  > >
>  > >
>  > >
> ------------------------------------------------------------------------
>  > > *From:* Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org> <mailto:mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>>>
>  > > *To:* joseph <jdsandjd at yahoo.com <mailto:jdsandjd at yahoo.com>
> <mailto:jdsandjd at yahoo.com <mailto:jdsandjd at yahoo.com>>>
>  > > *Cc:* bioc-sig-sequencing at r-project.org
> <mailto:bioc-sig-sequencing at r-project.org>
>  > <mailto:bioc-sig-sequencing at r-project.org
> <mailto:bioc-sig-sequencing at r-project.org>>
>  > > *Sent:* Wed, March 23, 2011 4:21:25 PM
>  > > *Subject:* Re: [Bioc-sig-seq] readFastq() error
>  > >
>  > > On 03/23/2011 04:07 PM, Martin Morgan wrote:
>  > > > On 03/23/2011 03:58 PM, joseph wrote:
>  > > >> Hello
>  > > >> How would you fix a FASTQ file that gives the following error when
>  > > >> read with
>  > > >> readFastq()?
>  > > >> Other lanes from the same flow cell are imported fine with
>  > readFastq().
>  > > >>
>  > > >> rfq = readFastq("~/myDir", pattern="reads.fq")
>  > > >> Error: Input/Output
>  > > >> file(s):
>  > > >> ~/myDir/reads.fq
>  > > >> message: IncompatibleTypes
>  > > >> message: invalid class "ShortReadQ" object: some sread and quality
>  > > widths
>  > > >> differ
>  > > >>
>  > > >
>  > > > you could read the file in
>  > > >
>  > > > x = readLines('~/myDir/reads.fq')
>  > > >
>  > > > split it into reads and qualities
>  > > >
>  > > > rd = x[c(FALSE, TRUE, FALSE, FALSE)]
>  > > > qual = x[c(FALSE, FALSE, TRUE, FALSE)]
>  > >
>  > > oops, x[c(FALSE, FALSE, FALSE, TRUE)]
>  > >
>  > > >
>  > > > and ask which have different numbers of characters
>  > > >
>  > > > which(nchar(rd) != nchar(qual))
>  > > >
>  > > > Martin
>  > > >
>  > > >> head reads.fq
>  > > >> @GAII_0001:6:1:0:101#0/1
>  > > >> NCTCANCATTGTTTGGACGGAACAAAACCGGGGACAATCT
>  > > >> +GAII_0001:6:1:0:101#0/1
>  > > >> BX[_\B_VXGQQU]]]YTPMGWTZZTVQ_X[TGYPZG[WZ
>  > > >> @GAII_0001:6:1:0:123#0/1
>  > > >> NGTGANTCNGCTCATTGCGAGTTTTAACCTTTTCTCTATC
>  > > >> +GAII_0001:6:1:0:123#0/1
>  > > >> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
>  > > >> @GAII_0001:6:1:0:168#0/1
>  > > >> NCCAGNCCCAGCAGCCCTTCCTTTTCCCTGCTTACCCTCA
>  > > >>
>  > > >>
>  > > >>
>  > > >> [[alternative HTML version deleted]]
>  > > >>
>  > > >> _______________________________________________
>  > > >> Bioc-sig-sequencing mailing list
>  > > >> Bioc-sig-sequencing at r-project.org
> <mailto:Bioc-sig-sequencing at r-project.org>
>  > <mailto:Bioc-sig-sequencing at r-project.org
> <mailto:Bioc-sig-sequencing at r-project.org>>
>  > > <mailto:Bioc-sig-sequencing at r-project.org
> <mailto:Bioc-sig-sequencing at r-project.org>
>  > <mailto:Bioc-sig-sequencing at r-project.org
> <mailto:Bioc-sig-sequencing at r-project.org>>>
>  > > >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>  > > >
>  > > >
>  > >
>  > >
>  > > --
>  > > Computational Biology
>  > > Fred Hutchinson Cancer Research Center
>  > > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>  > >
>  > > Location: M1-B861
>  > > Telephone: 206 667-2793
>  > >
>  >
>  >
>  > --
>  > Computational Biology
>  > Fred Hutchinson Cancer Research Center
>  > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>  >
>  > Location: M1-B861
>  > Telephone: 206 667-2793
>  >
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list