[Bioc-sig-seq] Getting file names from list.files in a more useful order

Michael Muratet mmuratet at hudsonalpha.org
Thu Oct 8 23:17:03 CEST 2009


On Oct 7, 2009, at 10:08 PM, Martin Morgan wrote:

> Hi Michael --
>
> Michael Muratet wrote:
>> Greetings
>>
>> I am working on adapting readIntensities from ShortRead to handle the
>> new Illumina intensity file format, *.cif. Illumina has dropped the
>> leading zeros from the file name so that if you use list.files to get
>> file names from the old style you get:
>>
>>
>
>
> you could extract the lane and tile information along the lines of
>
>  files = c("s_1_1.cif", "s_1_10.cif")
>  lanes = as.integer(sub("s_([[:digit:]]+).*", "\\1", files))
>  tiles = as.integer(sub(".*_([[:digit:]]+).cif", "\\1", files))
>
> and then order the files with
>
>  files[order(lanes, tiles)]
>
> In earlier versions, I think the file name is actually configurable by
> the pipeline software, and recorded in the xml configuration files;  
> few
> people seemed to actually do this though.
>

Martin

Thanks for the suggestions. Everything appears to be working in a  
satisfactory manner now. Give me a few days to test and verify, update  
the docs and tests, and I'll send you diffs to incorporate.

Regards

Mike

>
>
> -- 
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806



More information about the Bioc-sig-sequencing mailing list