[Bioc-sig-seq] Illumina CIF format for ShortRead?

Michael Muratet mmuratet at hudsonalpha.org
Thu Sep 10 20:54:36 CEST 2009


On Sep 10, 2009, at 1:40 PM, Martin Morgan wrote:

> Michael Muratet wrote:
>> Greetings
>>
>> I would like to be able to use the ShortRead package on CIF file data
>> from the latest version of the Illumina SCS/Pipeline tools. It  
>> appears
>> that the current version of ShortRead (1.2.1?) doesn't handle this
>> format. I have a snippet of a R script that will read these binary  
>> files
>> and produce the same output as the Illumina cifToTxt tool and I'm
>> willing to do the work to incorporate it into ShortRead. Are there
>> already plans to do this? Can anyone point me to document that  
>> describes
>> the basic syntax and data structures behind R objects of this class?
>> I've looked at the ShortRead source and I'm not sure I could figure  
>> it
>> out just from that.
>
> Hi Michael --
>
> No, ShortRead does not parse CIF format. Is there a specification
> somewhere? If you'd like to contribute the relevant parser, that would
> be great! You'll probably want to use the development version of R and
> of ShortRead (currently 1.3.33).
Martin
There's a spec on pg 119 of the v1.4 pipeline manual. The noise file  
(*.cnf) follows the same format. You also have to read position files  
to get the coordinates within a tile of the intensity values.
>
> You're aiming for an object of class AlignedRead, which you would
> construct from the bits you parse with a call to
>
>  AlignedRead(<your stuff here>)
The CIF files are intensity before crosstalk/offset/phase corrections  
and basecalling. Is there not a separate structure for intensity  
values? I see in the R folder in the source there is readIntensities  
method that accepts 'SolexaIntensity' and 'IparIntensity'. I don't  
know the data structures well enough yet to know where the data goes,  
although I can see how one might add 'CifIntensity' to the code.
>
> there is some 'essential' information, like the reads and their  
> quality
> scores, the chromosome and position of alignement; other stuff gets  
> put
> in an 'AlignedDataFrame'.
>
> See ?AlignedRead and ?"AlignedRead-class" for more. I'm happy to  
> provide
> additional guidance, too.

I'll download the development versions and see what I can do.

Regards

Mike
>
> Martin
>
>>
>> Hopefully, the CIF format will be around for awhile.
>>
>> Thanks
>>
>> Mike
>>
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>>
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>>
>>
>>
>>
>>
>



More information about the Bioc-sig-sequencing mailing list