[Bioc-sig-seq] How to generate random nucleotide sequences?

Purnachander purna at atc.tcs.com
Tue Sep 7 07:24:04 CEST 2010


Hello All,

I generated random nucleotide sequences having almost equal 
trinucleotide frequencies to a query sequence, using "sample" function 
in the following way:

seq1<-paste(sample(alpha,333,replace=TRUE,prob=freq),collapse=""); where 
"alpha" is a vector of 64 trinucleotides possible from the set 
c("A","G","C"."T") and *"freq" is a frequency vector of 64 
trinucleotides present in a given query sequence*.

Let's consider a random sequence generated in above described way. Does 
the random sequence preserve the mon- and di- nucleotide frequencies of 
the query sequence? I mean, do the mono and di nucleotide frequencies of 
random sequence are similar to mono and di nucleotide frequencies of 
query sequence?

In one of the cases I worked with, the answer was "No" to the above 
question. If that is the case, How to generate a random sequence 
preserving a mono-, di- and tri- nucleotide frequencies of the query 
sequence?

Regards,
Purnachander G



More information about the Bioc-sig-sequencing mailing list