[Bioc-sig-seq] readFastq() error
Martin Morgan
mtmorgan at fhcrc.org
Thu Mar 24 18:33:24 CET 2011
On 03/24/2011 10:01 AM, joseph wrote:
> @GAII_0001:6:91:210:160#0/1
> CTCGCGAAGCTTCTCTGGAGGAGAGTGATGTACGATGNCN
> +GAII_0001:6:91:210:160#0/1
> a__a_a__a_ba]abbabXa__a_BBBBBBBBBBBBBB
> boyce-162-119:mRNA_monocyte jdhahbi$
As you can see, the last read has two quality scores less than the
number of nucleotides. This has been introduced somewhere in your
upstream processing path.
Martin
>
>
>
> ------------------------------------------------------------------------
> *From:* Martin Morgan <mtmorgan at fhcrc.org>
> *To:* joseph <jdsandjd at yahoo.com>
> *Cc:* bioc-sig-sequencing at r-project.org
> *Sent:* Thu, March 24, 2011 9:39:42 AM
> *Subject:* Re: [Bioc-sig-seq] readFastq() error
>
> On 03/24/2011 09:41 AM, joseph wrote:
> > I added a new line character at the end of the file
> > echo >> reads.fq
> > I got the same numbers when I repeated the analysis
>
> you indicated that there were 16509910 reads in the file, and the test
> indicates its the last read that causes problems, so what does the last
> read look like? e.g., tail reads.fq
>
> Martin
> >
> >
> > ------------------------------------------------------------------------
> > *From:* Martin Morgan <mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>
> > *To:* joseph <jdsandjd at yahoo.com <mailto:jdsandjd at yahoo.com>>
> > *Cc:* bioc-sig-sequencing at r-project.org
> <mailto:bioc-sig-sequencing at r-project.org>
> > *Sent:* Wed, March 23, 2011 7:44:40 PM
> > *Subject:* Re: [Bioc-sig-seq] readFastq() error
> >
> > On 03/23/2011 05:49 PM, joseph wrote:
> > > Hi Martin
> > > here is what I got:
> > > x = readLines('~/myDir/reads.fq')
> > > rd = x[c(FALSE, TRUE, FALSE, FALSE)]
> > > qual = x[c(FALSE, FALSE, FALSE, TRUE)]
> > > > which(nchar(rd) != nchar(qual))
> > > [1] 16509910
> > > # that is all the reads in the file
> > > # When I tried to count the reads with the same number of characters, I
> > > also got all the reads
> > > > length(which(nchar(rd) == nchar(qual)))
> > > [1] 16509909
> >
> > I suspect there is a missing end-of-line on the last line of the file.
> > >
> > > Joseph
> > >
> > >
> > >
> > >
> ------------------------------------------------------------------------
> > > *From:* Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org> <mailto:mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>>>
> > > *To:* joseph <jdsandjd at yahoo.com <mailto:jdsandjd at yahoo.com>
> <mailto:jdsandjd at yahoo.com <mailto:jdsandjd at yahoo.com>>>
> > > *Cc:* bioc-sig-sequencing at r-project.org
> <mailto:bioc-sig-sequencing at r-project.org>
> > <mailto:bioc-sig-sequencing at r-project.org
> <mailto:bioc-sig-sequencing at r-project.org>>
> > > *Sent:* Wed, March 23, 2011 4:21:25 PM
> > > *Subject:* Re: [Bioc-sig-seq] readFastq() error
> > >
> > > On 03/23/2011 04:07 PM, Martin Morgan wrote:
> > > > On 03/23/2011 03:58 PM, joseph wrote:
> > > >> Hello
> > > >> How would you fix a FASTQ file that gives the following error when
> > > >> read with
> > > >> readFastq()?
> > > >> Other lanes from the same flow cell are imported fine with
> > readFastq().
> > > >>
> > > >> rfq = readFastq("~/myDir", pattern="reads.fq")
> > > >> Error: Input/Output
> > > >> file(s):
> > > >> ~/myDir/reads.fq
> > > >> message: IncompatibleTypes
> > > >> message: invalid class "ShortReadQ" object: some sread and quality
> > > widths
> > > >> differ
> > > >>
> > > >
> > > > you could read the file in
> > > >
> > > > x = readLines('~/myDir/reads.fq')
> > > >
> > > > split it into reads and qualities
> > > >
> > > > rd = x[c(FALSE, TRUE, FALSE, FALSE)]
> > > > qual = x[c(FALSE, FALSE, TRUE, FALSE)]
> > >
> > > oops, x[c(FALSE, FALSE, FALSE, TRUE)]
> > >
> > > >
> > > > and ask which have different numbers of characters
> > > >
> > > > which(nchar(rd) != nchar(qual))
> > > >
> > > > Martin
> > > >
> > > >> head reads.fq
> > > >> @GAII_0001:6:1:0:101#0/1
> > > >> NCTCANCATTGTTTGGACGGAACAAAACCGGGGACAATCT
> > > >> +GAII_0001:6:1:0:101#0/1
> > > >> BX[_\B_VXGQQU]]]YTPMGWTZZTVQ_X[TGYPZG[WZ
> > > >> @GAII_0001:6:1:0:123#0/1
> > > >> NGTGANTCNGCTCATTGCGAGTTTTAACCTTTTCTCTATC
> > > >> +GAII_0001:6:1:0:123#0/1
> > > >> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> > > >> @GAII_0001:6:1:0:168#0/1
> > > >> NCCAGNCCCAGCAGCCCTTCCTTTTCCCTGCTTACCCTCA
> > > >>
> > > >>
> > > >>
> > > >> [[alternative HTML version deleted]]
> > > >>
> > > >> _______________________________________________
> > > >> Bioc-sig-sequencing mailing list
> > > >> Bioc-sig-sequencing at r-project.org
> <mailto:Bioc-sig-sequencing at r-project.org>
> > <mailto:Bioc-sig-sequencing at r-project.org
> <mailto:Bioc-sig-sequencing at r-project.org>>
> > > <mailto:Bioc-sig-sequencing at r-project.org
> <mailto:Bioc-sig-sequencing at r-project.org>
> > <mailto:Bioc-sig-sequencing at r-project.org
> <mailto:Bioc-sig-sequencing at r-project.org>>>
> > > >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> > > >
> > > >
> > >
> > >
> > > --
> > > Computational Biology
> > > Fred Hutchinson Cancer Research Center
> > > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> > >
> > > Location: M1-B861
> > > Telephone: 206 667-2793
> > >
> >
> >
> > --
> > Computational Biology
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> >
> > Location: M1-B861
> > Telephone: 206 667-2793
> >
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-sig-sequencing
mailing list