[Bioc-sig-seq] ShortRead support for "id", "paired read number" and "multiplex index" when reading an Illumina export file

Wed Feb 24 11:55:24 CET 2010

Hi Martin, everyone,

I've been looking forward to doing it for a long time now, and,  
finally,  I got the time. So, I dove into the ShortRead C code to add  
some functionalities when loading Illumina export files. I've added an  
option to the readAligned method, specifically for the type  
"SolexaExport" that will in addition to the default information,  
retrieve the multiplex barcode and the paired read number (the 6 and  
7th column of the export file, that were ignored so far).  
Additionally, using this option will create the sequence identifier  
(i.e. the one you get in a fastq file extracted from an export file)  
and populate the id slot of the alignedRead object.

I've attached the diff of my local working copy with the revision  
44842 of ShortRead (the current one, as of this morning), two example  
export files (one from a single-end (SE) and one from a paired-end  
(PE) sequencing experiment) and a small R script showing the modified  
usage.

I think that these functionalities are very interesting for people,  
like me, who have to analyze PE, multiplexed data, and I'd be glad if  
they got integrated.

Finally, I'm, by far, not a C expert, so you might wish/(need?) to  
optimize what I've written.

Best,

---------------------------------------------------------------
Nicolas Delhomme

High Throughput Functional Genomics Center

European Molecular Biology Laboratory

Tel: +49 6221 387 8426
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: my_copy_vs_revision_44842.diff
Type: application/octet-stream
Size: 4942 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioc-sig-sequencing/attachments/20100224/6e54618d/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_SE_export.txt.gz
Type: application/x-gzip
Size: 2140 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioc-sig-sequencing/attachments/20100224/6e54618d/attachment-0002.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_PE_export.txt.gz
Type: application/x-gzip
Size: 19136 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioc-sig-sequencing/attachments/20100224/6e54618d/attachment-0003.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.R
Type: application/octet-stream
Size: 581 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioc-sig-sequencing/attachments/20100224/6e54618d/attachment-0003.obj>