[Bioc-sig-seq] Illumina CIF format for ShortRead?

Martin Morgan mtmorgan at fhcrc.org
Thu Sep 10 20:53:13 CEST 2009


Martin Morgan wrote:
> Michael Muratet wrote:
>> Greetings
>>
>> I would like to be able to use the ShortRead package on CIF file data
>> from the latest version of the Illumina SCS/Pipeline tools. It appears
>> that the current version of ShortRead (1.2.1?) doesn't handle this
>> format. I have a snippet of a R script that will read these binary files
>> and produce the same output as the Illumina cifToTxt tool and I'm
>> willing to do the work to incorporate it into ShortRead. Are there
>> already plans to do this? Can anyone point me to document that describes
>> the basic syntax and data structures behind R objects of this class?
>> I've looked at the ShortRead source and I'm not sure I could figure it
>> out just from that.
> 
> Hi Michael --
> 
> No, ShortRead does not parse CIF format. Is there a specification
> somewhere? If you'd like to contribute the relevant parser, that would
> be great! You'll probably want to use the development version of R and
> of ShortRead (currently 1.3.33).
> 
> You're aiming for an object of class AlignedRead, which you would
> construct from the bits you parse with a call to
> 
>   AlignedRead(<your stuff here>)
> 
> there is some 'essential' information, like the reads and their quality
> scores, the chromosome and position of alignement; other stuff gets put
> in an 'AlignedDataFrame'.
> 
> See ?AlignedRead and ?"AlignedRead-class" for more. I'm happy to provide
> additional guidance, too.

oops a little quick on the draw. I see that 'cif' is 'cluster intensity
file' so earlier in the pipeline. You can use the function
SolexaIntensity as something to shoot for

> SolexaIntensity
function (intensity = array(0, c(0, 0, 0)), measurementError = array(0,
    c(0, 0, 0)), readInfo = SolexaIntensityInfo(lane =
integer(nrow(intensity))),

the intensities are based on a 3-dimensional array (read, nucleotide,
cycle). Some hints are in ?readIntensities, ?"Intensity-class", and
perhaps ShortRead:::.readIntensities_SolexaIntensity

Martin

> 
> Martin
> 
>> Hopefully, the CIF format will be around for awhile.
>>
>> Thanks
>>
>> Mike
>>
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>>
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>>
>>
>>
>>
>>
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list