[BioC] Import sequences from MacClade 4.*
Brian
zenlines at gmail.com
Thu Feb 16 22:52:51 CET 2012
Hi again,
so after a bit of R-programming fun I wrote the following. I realize now
that that was actually a sequence alignment and not a typical sequence
file. Nonetheless, should anyone be tasked with rooting up old data from
a macclade file then, as promised:
read.macclade <- function(file, raw.sequences=TRUE,
return.metadata=FALSE,...){
## file - path to file
## return.metadata - return the metadata from the mcclade file
## ... - Additional arguments to "readLines" function
seqs <- readLines(file)
seqs.metadata.ind <- grep("^BEGIN|^MATRIX", seqs)
seqs.metadata <- seqs[(seqs.metadata.ind[1]+1):(seqs.metadata.ind[2]-1)]
## Get the sequences
seqs.modal_index <- grep("^\\[Modal",seqs)
seq_end <- grep("END",seqs)[1]
seqs_sub <- seqs[(seqs.modal_index[1]+2):(seq_end-2)]
seqs_sub.modal_index <- grep("^\\[Modal",seqs_sub)
seqs.modal <- seqs[seqs.modal_index]
## Get the lines
seqs_sub <- grep("^[[:alnum:]]",seqs_sub,value=T)
## Cut it up
seqs_list_raw <-
sapply(seq(seqs_sub),
function(x)grep(".",
unlist(strsplit(x=seqs_sub[x],
split="[[:blank:]]")),value=TRUE))
## Put everything together
seqs.unique <- unique(seqs_list_raw[1,])
seqs_index <-
sapply(seqs.unique,function(seq)grep(seq,seqs_list_raw[1,]))
seqs_vect <- sapply(seqs.unique,function(seq)
paste(seqs_list_raw[2,seqs_index[,seq]],collapse=""))
seqs_lengths <- sapply(seqs.unique,function(seq)
gsub(pattern="\\[|\\]", replacement="",
x=seqs_list_raw[3,max(seqs_index[,seq])]))
if(raw.sequences)
{
seqs_raw <- sapply(seqs_vect,function(seq)
gsub("[[:punct:]]",replacement="",seq))
return(seqs_raw)
} else {
if(return.metadata) {return(list(seqs.metadata,
seqs_vect,seqs_lengths))
} else { print(seqs_lengths);
print(seqs.metadata);
seqs_vect}
}
}
Sorry if it looks ugly... but it works
Cheers,
Brian
More information about the Bioconductor
mailing list