[Bioc-sig-seq] Illumina CIF format for ShortRead?
Martin Morgan
mtmorgan at fhcrc.org
Thu Sep 10 20:53:13 CEST 2009
Martin Morgan wrote:
> Michael Muratet wrote:
>> Greetings
>>
>> I would like to be able to use the ShortRead package on CIF file data
>> from the latest version of the Illumina SCS/Pipeline tools. It appears
>> that the current version of ShortRead (1.2.1?) doesn't handle this
>> format. I have a snippet of a R script that will read these binary files
>> and produce the same output as the Illumina cifToTxt tool and I'm
>> willing to do the work to incorporate it into ShortRead. Are there
>> already plans to do this? Can anyone point me to document that describes
>> the basic syntax and data structures behind R objects of this class?
>> I've looked at the ShortRead source and I'm not sure I could figure it
>> out just from that.
>
> Hi Michael --
>
> No, ShortRead does not parse CIF format. Is there a specification
> somewhere? If you'd like to contribute the relevant parser, that would
> be great! You'll probably want to use the development version of R and
> of ShortRead (currently 1.3.33).
>
> You're aiming for an object of class AlignedRead, which you would
> construct from the bits you parse with a call to
>
> AlignedRead(<your stuff here>)
>
> there is some 'essential' information, like the reads and their quality
> scores, the chromosome and position of alignement; other stuff gets put
> in an 'AlignedDataFrame'.
>
> See ?AlignedRead and ?"AlignedRead-class" for more. I'm happy to provide
> additional guidance, too.
oops a little quick on the draw. I see that 'cif' is 'cluster intensity
file' so earlier in the pipeline. You can use the function
SolexaIntensity as something to shoot for
> SolexaIntensity
function (intensity = array(0, c(0, 0, 0)), measurementError = array(0,
c(0, 0, 0)), readInfo = SolexaIntensityInfo(lane =
integer(nrow(intensity))),
the intensities are based on a 3-dimensional array (read, nucleotide,
cycle). Some hints are in ?readIntensities, ?"Intensity-class", and
perhaps ShortRead:::.readIntensities_SolexaIntensity
Martin
>
> Martin
>
>> Hopefully, the CIF format will be around for awhile.
>>
>> Thanks
>>
>> Mike
>>
>> Michael Muratet, Ph.D.
>> Senior Scientist
>> HudsonAlpha Institute for Biotechnology
>> mmuratet at hudsonalpha.org
>> (256) 327-0473 (p)
>> (256) 327-0966 (f)
>>
>> Room 4005
>> 601 Genome Way
>> Huntsville, Alabama 35806
>>
>>
>>
>>
>>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
More information about the Bioc-sig-sequencing
mailing list