[BioC] writeFastq writing dashes instead of dots
Martin Morgan
mtmorgan at fhcrc.org
Tue Feb 26 19:38:35 CET 2013
Hi Thomas --
On 02/25/2013 09:08 AM, Thomas Rensch wrote:
> Hi everyone,
>
> I am reading and writing fastq files and writeFastq just swaps dots ('.') for dashes ('-').
>
> Is this the desired behaviour of writeFastq and if so why? Otherwise could someone better at R development than I modify this?
>
> Example fastq:
>
>
> @HWI-EAS149_3:1:1:0:1956:0:1:0
> .A..................................
> +
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> @HWI-EAS149_3:1:1:0:173:0:1:0
> .T..................................
> +
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> @HWI-EAS149_3:1:1:0:47:0:1:0
> .T..................................
> +
> BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
It's actually on input
> sread(readFastq("tmp.fastq"))
A DNAStringSet instance of length 3
width seq
[1] 36 -A----------------------------------
[2] 36 -T----------------------------------
[3] 36 -T----------------------------------
because '.' is not an a valid letter for DNAStringSet (or from the international
standard, if I understand correctly...)
> DNAStringSet(".")
Error in .Call2("new_XString_from_CHARACTER", classname, x, start(solved_SEW), :
key 46 (char '.') not in lookup table
> alphabet(DNAStringSet())
[1] "A" "C" "G" "T" "M" "R" "W" "S" "Y" "K" "V" "H" "D" "B" "N" "-" "+"
> DNA_ALPHABET
[1] "A" "C" "G" "T" "M" "R" "W" "S" "Y" "K" "V" "H" "D" "B" "N" "-" "+"
and from ?DNA_ALPHABET
This alphabet contains all letters from the IUPAC Extended Genetic
Alphabet (see '?IUPAC_CODE_MAP') + the gap ('"-"') and the hard
masking ('"+"') letters.
One possibility would be to add an option writeFastq(..., dashesASdots=FALSE),
is that really a good idea?
Martin
>
>
> Thanks a lot,
> Thomas
>
> --
> Thomas Rensch
> PhD Student - Paul Flicek Group
> EMBL-EBI
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list