[BioC] how to operate on a DNAStringSet object
Chris Seidel
seidel at phaget4.org
Thu Mar 21 21:48:01 CET 2013
I can't figure out how to operate on a DNAStringSet object, and get a
DNAStringSet object back. I have a DNAStringSet object for which I'd
like to randomize the order of nucleotides in the sequences. Here is a
simple example:
# function to randomly re-order a string
randomizeSeq <- function(mystring){
# split the string into characters
myChars <- unlist(strsplit(as.character(mystring),''))
# randomize the order of characters
random.myChars <- sample(myChars,length(myChars))
# concatenate them back into a string and return the result
DNAStringSet(paste(random.myChars,collapse=''))
}
# create some sequences
seq1 <- paste(DNA_BASES[sample(1:4,5,replace=T)], collapse="")
seq2 <- paste(DNA_BASES[sample(1:4,5,replace=T)], collapse="")
# make a DNAStringSet object
myseqs <- DNAStringSet(c(seq1,seq2))
names(myseqs) <- c("s1", "s2")
# apply the randomize function to the DNAStringSet object
myRandomizedseqs <- sapply(myseqs,randomizeSeq)
# The result is a list, try to make a DNAStringSetObject out of it
DNAStringSet(unlist(myRandomizedseqs))
# that doesn't work, try something else
DNAStringSet(do.call(c,unlist(myRandomizedseqs)))
# that doesn't work.
I gave up (after several hours) and wrote a for loop, which works but
takes forever on my sequence set, and makes me feel stupid.
What's odd, is that this actually works:
DNAStringSet(do.call(c,unlist(myRandomizedseqs)))
*IF* the sequences are NOT NAMED.
How does one operate on the sequences of a DNAStringSet object without
getting a list back, or without a for loop? I'm sure there's some
elegant one-liner that completely escapes me.
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Dmelanogaster.UCSC.dm3_1.3.19 BSgenome_1.26.1
Biostrings_2.26.2 GenomicRanges_1.10.5
[5] IRanges_1.16.4 BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] parallel_2.15.2 stats4_2.15.2 tools_2.15.2
More information about the Bioconductor
mailing list