[Bioc-sig-seq] readFastq() error

Martin Morgan mtmorgan at fhcrc.org
Thu Mar 24 17:39:42 CET 2011


On 03/24/2011 09:41 AM, joseph wrote:
> I added a new line character at the end of the file
> echo >> reads.fq
> I got the same numbers when I repeated the analysis

you indicated that there were 16509910 reads in the file, and the test 
indicates its the last read that causes problems, so what does the last 
read look like? e.g., tail reads.fq

Martin
>
>
> ------------------------------------------------------------------------
> *From:* Martin Morgan <mtmorgan at fhcrc.org>
> *To:* joseph <jdsandjd at yahoo.com>
> *Cc:* bioc-sig-sequencing at r-project.org
> *Sent:* Wed, March 23, 2011 7:44:40 PM
> *Subject:* Re: [Bioc-sig-seq] readFastq() error
>
> On 03/23/2011 05:49 PM, joseph wrote:
>  > Hi Martin
>  > here is what I got:
>  > x = readLines('~/myDir/reads.fq')
>  > rd = x[c(FALSE, TRUE, FALSE, FALSE)]
>  > qual = x[c(FALSE, FALSE, FALSE, TRUE)]
>  > > which(nchar(rd) != nchar(qual))
>  > [1] 16509910
>  > # that is all the reads in the file
>  > # When I tried to count the reads with the same number of characters, I
>  > also got all the reads
>  > > length(which(nchar(rd) == nchar(qual)))
>  > [1] 16509909
>
> I suspect there is a missing end-of-line on the last line of the file.
>  >
>  > Joseph
>  >
>  >
>  >
>  > ------------------------------------------------------------------------
>  > *From:* Martin Morgan <mtmorgan at fhcrc.org <mailto:mtmorgan at fhcrc.org>>
>  > *To:* joseph <jdsandjd at yahoo.com <mailto:jdsandjd at yahoo.com>>
>  > *Cc:* bioc-sig-sequencing at r-project.org
> <mailto:bioc-sig-sequencing at r-project.org>
>  > *Sent:* Wed, March 23, 2011 4:21:25 PM
>  > *Subject:* Re: [Bioc-sig-seq] readFastq() error
>  >
>  > On 03/23/2011 04:07 PM, Martin Morgan wrote:
>  > > On 03/23/2011 03:58 PM, joseph wrote:
>  > >> Hello
>  > >> How would you fix a FASTQ file that gives the following error when
>  > >> read with
>  > >> readFastq()?
>  > >> Other lanes from the same flow cell are imported fine with
> readFastq().
>  > >>
>  > >> rfq = readFastq("~/myDir", pattern="reads.fq")
>  > >> Error: Input/Output
>  > >> file(s):
>  > >> ~/myDir/reads.fq
>  > >> message: IncompatibleTypes
>  > >> message: invalid class "ShortReadQ" object: some sread and quality
>  > widths
>  > >> differ
>  > >>
>  > >
>  > > you could read the file in
>  > >
>  > > x = readLines('~/myDir/reads.fq')
>  > >
>  > > split it into reads and qualities
>  > >
>  > > rd = x[c(FALSE, TRUE, FALSE, FALSE)]
>  > > qual = x[c(FALSE, FALSE, TRUE, FALSE)]
>  >
>  > oops, x[c(FALSE, FALSE, FALSE, TRUE)]
>  >
>  > >
>  > > and ask which have different numbers of characters
>  > >
>  > > which(nchar(rd) != nchar(qual))
>  > >
>  > > Martin
>  > >
>  > >> head reads.fq
>  > >> @GAII_0001:6:1:0:101#0/1
>  > >> NCTCANCATTGTTTGGACGGAACAAAACCGGGGACAATCT
>  > >> +GAII_0001:6:1:0:101#0/1
>  > >> BX[_\B_VXGQQU]]]YTPMGWTZZTVQ_X[TGYPZG[WZ
>  > >> @GAII_0001:6:1:0:123#0/1
>  > >> NGTGANTCNGCTCATTGCGAGTTTTAACCTTTTCTCTATC
>  > >> +GAII_0001:6:1:0:123#0/1
>  > >> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
>  > >> @GAII_0001:6:1:0:168#0/1
>  > >> NCCAGNCCCAGCAGCCCTTCCTTTTCCCTGCTTACCCTCA
>  > >>
>  > >>
>  > >>
>  > >> [[alternative HTML version deleted]]
>  > >>
>  > >> _______________________________________________
>  > >> Bioc-sig-sequencing mailing list
>  > >> Bioc-sig-sequencing at r-project.org
> <mailto:Bioc-sig-sequencing at r-project.org>
>  > <mailto:Bioc-sig-sequencing at r-project.org
> <mailto:Bioc-sig-sequencing at r-project.org>>
>  > >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>  > >
>  > >
>  >
>  >
>  > --
>  > Computational Biology
>  > Fred Hutchinson Cancer Research Center
>  > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>  >
>  > Location: M1-B861
>  > Telephone: 206 667-2793
>  >
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list