[Bioc-sig-seq] Illumina CIF format for ShortRead?

Martin Morgan mtmorgan at fhcrc.org
Thu Sep 10 20:40:49 CEST 2009


Michael Muratet wrote:
> Greetings
> 
> I would like to be able to use the ShortRead package on CIF file data
> from the latest version of the Illumina SCS/Pipeline tools. It appears
> that the current version of ShortRead (1.2.1?) doesn't handle this
> format. I have a snippet of a R script that will read these binary files
> and produce the same output as the Illumina cifToTxt tool and I'm
> willing to do the work to incorporate it into ShortRead. Are there
> already plans to do this? Can anyone point me to document that describes
> the basic syntax and data structures behind R objects of this class?
> I've looked at the ShortRead source and I'm not sure I could figure it
> out just from that.

Hi Michael --

No, ShortRead does not parse CIF format. Is there a specification
somewhere? If you'd like to contribute the relevant parser, that would
be great! You'll probably want to use the development version of R and
of ShortRead (currently 1.3.33).

You're aiming for an object of class AlignedRead, which you would
construct from the bits you parse with a call to

  AlignedRead(<your stuff here>)

there is some 'essential' information, like the reads and their quality
scores, the chromosome and position of alignement; other stuff gets put
in an 'AlignedDataFrame'.

See ?AlignedRead and ?"AlignedRead-class" for more. I'm happy to provide
additional guidance, too.

Martin

> 
> Hopefully, the CIF format will be around for awhile.
> 
> Thanks
> 
> Mike
> 
> Michael Muratet, Ph.D.
> Senior Scientist
> HudsonAlpha Institute for Biotechnology
> mmuratet at hudsonalpha.org
> (256) 327-0473 (p)
> (256) 327-0966 (f)
> 
> Room 4005
> 601 Genome Way
> Huntsville, Alabama 35806
> 
> 
> 
> 
>



More information about the Bioc-sig-sequencing mailing list