[Bioc-sig-seq] re-ordering the sequence from ShortRead

Martin Morgan mtmorgan at fhcrc.org
Wed Jun 1 01:28:18 CEST 2011


On 05/31/2011 04:13 PM, Kunbin Qu wrote:
> Hi, I need to do some computation based on subset sampling from the
> original fastq file. For example, the file I have from one lane has
> 30 million reads, I'd like to sample 3 subsets from the 30 million
> reads: 20, 10, 5 millions from the original 30 M reads. And the
> bigger set contains the smaller sets, ie, all 5 million reads are
> within the 10 and 20 million sets, and 10 M is in the 20 M set.
>
> It appears to me that I need to generate a series of random number
> and re-order the original read file according to those random number
> to ensure the enclosure. Could somebody tell me a way to re-ordering
> the reads based a set of random numbers? Thanks.

Hi Kunbin --

after rfq = readFastq(...), do rfq[sample(length(rfq), 
number_to_sample)]. You could also use FastqSampler (from ShortRead, 
make sure to use v. 1.10.4) to generate the large sample, then subset. 
Save with writeFastq.

Martin

>
> -Kunbin
>
>
>
> ______________________________________________________________________
>
>
The contents of this electronic message, including any attachments, are 
intended only for the use of the individual or entity to which they are 
addressed and may contain confidential information. If you are not the 
intended recipient, you are hereby notified that any use, dissemination, 
distribution, or copying of this message or any attachment is strictly 
prohibited. If you have received this transmission in error, please send 
an e-mail to postmaster at genomichealth.com and delete this message, along 
with any attachments, from your computer.
> [[alternative HTML version deleted]]
>
> _______________________________________________ Bioc-sig-sequencing
> mailing list Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list