[Bioc-sig-seq] readFastq() error

Martin Morgan mtmorgan at fhcrc.org
Thu Mar 24 03:44:40 CET 2011


On 03/23/2011 05:49 PM, joseph wrote:
> Hi Martin
> here is what I got:
> x = readLines('~/myDir/reads.fq')
> rd = x[c(FALSE, TRUE, FALSE, FALSE)]
> qual = x[c(FALSE, FALSE, FALSE, TRUE)]
>  > which(nchar(rd) != nchar(qual))
> [1] 16509910
> # that is all the reads in the file
> # When I tried to count the reads with the same number of characters, I
> also got all the reads
>  > length(which(nchar(rd) == nchar(qual)))
> [1] 16509909

I suspect there is a missing end-of-line on the last line of the file.
>
> Joseph
>
>
>
> ------------------------------------------------------------------------
> *From:* Martin Morgan <mtmorgan at fhcrc.org>
> *To:* joseph <jdsandjd at yahoo.com>
> *Cc:* bioc-sig-sequencing at r-project.org
> *Sent:* Wed, March 23, 2011 4:21:25 PM
> *Subject:* Re: [Bioc-sig-seq] readFastq() error
>
> On 03/23/2011 04:07 PM, Martin Morgan wrote:
>  > On 03/23/2011 03:58 PM, joseph wrote:
>  >> Hello
>  >> How would you fix a FASTQ file that gives the following error when
>  >> read with
>  >> readFastq()?
>  >> Other lanes from the same flow cell are imported fine with readFastq().
>  >>
>  >> rfq = readFastq("~/myDir", pattern="reads.fq")
>  >> Error: Input/Output
>  >> file(s):
>  >> ~/myDir/reads.fq
>  >> message: IncompatibleTypes
>  >> message: invalid class "ShortReadQ" object: some sread and quality
> widths
>  >> differ
>  >>
>  >
>  > you could read the file in
>  >
>  > x = readLines('~/myDir/reads.fq')
>  >
>  > split it into reads and qualities
>  >
>  > rd = x[c(FALSE, TRUE, FALSE, FALSE)]
>  > qual = x[c(FALSE, FALSE, TRUE, FALSE)]
>
> oops, x[c(FALSE, FALSE, FALSE, TRUE)]
>
>  >
>  > and ask which have different numbers of characters
>  >
>  > which(nchar(rd) != nchar(qual))
>  >
>  > Martin
>  >
>  >> head reads.fq
>  >> @GAII_0001:6:1:0:101#0/1
>  >> NCTCANCATTGTTTGGACGGAACAAAACCGGGGACAATCT
>  >> +GAII_0001:6:1:0:101#0/1
>  >> BX[_\B_VXGQQU]]]YTPMGWTZZTVQ_X[TGYPZG[WZ
>  >> @GAII_0001:6:1:0:123#0/1
>  >> NGTGANTCNGCTCATTGCGAGTTTTAACCTTTTCTCTATC
>  >> +GAII_0001:6:1:0:123#0/1
>  >> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
>  >> @GAII_0001:6:1:0:168#0/1
>  >> NCCAGNCCCAGCAGCCCTTCCTTTTCCCTGCTTACCCTCA
>  >>
>  >>
>  >>
>  >> [[alternative HTML version deleted]]
>  >>
>  >> _______________________________________________
>  >> Bioc-sig-sequencing mailing list
>  >> Bioc-sig-sequencing at r-project.org
> <mailto:Bioc-sig-sequencing at r-project.org>
>  >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>  >
>  >
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list