[Bioc-sig-seq] ShortRead support for "id", "paired read number" and "multiplex index" when reading an Illumina export file

Martin Morgan mtmorgan at fhcrc.org
Sat Feb 27 17:42:59 CET 2010


On 02/24/2010 06:19 AM, Martin Morgan wrote:
> Hi Nicolas --
> 
> These sounds like very useful additions, and I'll try to incorporate
> over the next day or so.
> 
> Thank you very much for the contribution!

In ShortRead v. 1.5.7 (available in bioc-devel in the next day or so, or
through svn now), for type="SolexaExport" there are additional options
withId=FALSE, withMultiplexIndex, withPairedReadNumber and withAll.
withMultiplexIndex and withPairedReadNumber read in the corresponding
fields from the _export file into columns in alignedData; withId
constructs an identifier from the machine, run, tile... information
(accessible with id(aln)); withAll is a convenience to turn all flags
on. See ?readAligned and news(Version>=1.5, "ShortRead").

Martin

> 
> Martin
> 
> On 02/24/2010 02:55 AM, Nicolas Delhomme wrote:
>> Hi Martin, everyone,
>>
>> I've been looking forward to doing it for a long time now, and,
>> finally,  I got the time. So, I dove into the ShortRead C code to add
>> some functionalities when loading Illumina export files. I've added an
>> option to the readAligned method, specifically for the type
>> "SolexaExport" that will in addition to the default information,
>> retrieve the multiplex barcode and the paired read number (the 6 and 7th
>> column of the export file, that were ignored so far). Additionally,
>> using this option will create the sequence identifier (i.e. the one you
>> get in a fastq file extracted from an export file) and populate the id
>> slot of the alignedRead object.
>>
>> I've attached the diff of my local working copy with the revision 44842
>> of ShortRead (the current one, as of this morning), two example export
>> files (one from a single-end (SE) and one from a paired-end (PE)
>> sequencing experiment) and a small R script showing the modified usage.
>>
>> I think that these functionalities are very interesting for people, like
>> me, who have to analyze PE, multiplexed data, and I'd be glad if they
>> got integrated.
>>
>> Finally, I'm, by far, not a C expert, so you might wish/(need?) to
>> optimize what I've written.
>>
>> Best,
>>
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>>
>> High Throughput Functional Genomics Center
>>
>> European Molecular Biology Laboratory
>>
>> Tel: +49 6221 387 8426
>> Email: nicolas.delhomme at embl.de
>> Meyerhofstrasse 1 - Postfach 10.2209
>> 69102 Heidelberg, Germany
>> ---------------------------------------------------------------
>>
>>
>>
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list