[Bioc-sig-seq] ReadFastq error

Hervé Pagès hpages at fhcrc.org
Fri Feb 19 23:39:47 CET 2010


Hi Ramzi,

One thing you can try is loading your fastq file with:

   library(Biostrings)
   bset <- read.BStringSet("path/to/your/file", format="fastq")

Note the use of read.BStringSet() instead of read.DNAStringSet().

Since BString/BStringSet objects are not limited to the DNA alphabet
(see ?DNA_ALPHABET), you should be able to load your file even if
it contains non-DNA letters (unless it has other problems of course).

Then you can do something like:

   ndnaletter_per_string <-
       vcountPDict(BStringSet(DNA_ALPHABET), bset, collapse=2)
   which(ndnaletter_per_string != width(bset))

to extract the list of fastq records (as an integer vector) that
contain at least 1 non-DNA letter. (Note that the code above works
only with R-devel + BioC-devel.)

That way you'll be able to know if you have records like this and
where they are.

readFastq() won't load a fastq file with non-DNA letters in it.

Cheers,
H.


Ramzi TEMANNI wrote:
> Hi,
> I'm encountering the following error when trying to load fastq file:
> 
> Error in .local(dirPath, pattern, ...) :
>   _DNAencode(): key 73 not in lookup table
> 
> Key 73 in ascii table is "I" (capital i)
> 
> Anyone had encountered such error before ?
> 
> Thanks in advance for your help
> 
> Regards,
> Ramzi
> 
>> sessionInfo()
> R version 2.10.1 (2009-12-14)
> x86_64-pc-linux-gnu
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] biomaRt_2.2.0      ShortRead_1.4.0    lattice_0.18-3
> BSgenome_1.14.2
> [5] Biostrings_2.14.12 IRanges_1.4.11
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.1 grid_2.10.1   hwriter_1.1   RCurl_1.3-1   XML_2.6-0
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-sig-sequencing mailing list