[Bioc-sig-seq] A modified readFASTA for reading 'quality fasta' files.

Dan Bolser dan.bolser at gmail.com
Tue Aug 26 13:49:08 CEST 2008


Hi all,

I have been using phred/phrap to do some sequence assembly, and I
thought that it might be useful to plot the 'quality scores' that are
output by phred.

In case you are interested, I have data from the ABI_3730, big-dye
terminator (DT3730POP7{BDv3}.mob).

I have adapted the readFASTA function in the Biostrings library to
read in the 'quality fasta' files produced by phred. Here is a diff of
the two functions so you can see what I did on the off chance that it
is useful for somebody else:

*** readFASTA.R
--- readQualityFASTA.R
***************
*** 1,5 ****

! readFASTA <-
    function (file, checkComments = TRUE, strip.desc = TRUE)
  {
    if (missing(strip.desc))
--- 1,5 ----

! readQualityFASTA <-
    function (file, checkComments = TRUE, strip.desc = TRUE)
  {
    if (missing(strip.desc))
***************
*** 37,43 ****
      desc <- s1[descriptions[i]]
      if (strip.desc)
        desc <- substr(desc, 2L, nchar(desc))
!     seq <- paste(s1[dp[i]:end[i]], collapse = "")
      list(desc = desc, seq = seq)
    })
  }
--- 37,44 ----
      desc <- s1[descriptions[i]]
      if (strip.desc)
        desc <- substr(desc, 2L, nchar(desc))
!     seq <- paste(s1[dp[i]:end[i]], collapse = " ")
!     seq <- as.numeric(strsplit(seq, " ", perl=TRUE)[[1]])
      list(desc = desc, seq = seq)
    })
  }


I am relatively new to BioC (and sequencing in general) so I am not
sure if there isn't a better way to do this... please don't hesitate
to point me at the relevant docs! In particular, where can I read
about the sequence assembly capabilities of R?

All the best,

Dan.

----

http://network.nature.com/profile/dan



More information about the Bioc-sig-sequencing mailing list