[BioC] Import sequences from MacClade 4.*

Brian zenlines at gmail.com
Thu Feb 16 22:52:51 CET 2012


Hi again,

so after a bit of R-programming fun I wrote the following. I realize now 
that that was actually a sequence alignment and not a typical sequence 
file. Nonetheless, should anyone be tasked with rooting up old data from 
a macclade file then, as promised:

read.macclade <- function(file, raw.sequences=TRUE, 
return.metadata=FALSE,...){
   ## file - path to file
   ## return.metadata - return the metadata from the mcclade file
   ## ... - Additional arguments to "readLines" function
   seqs <- readLines(file)
   seqs.metadata.ind <- grep("^BEGIN|^MATRIX", seqs)
   seqs.metadata <- seqs[(seqs.metadata.ind[1]+1):(seqs.metadata.ind[2]-1)]
   ## Get the sequences
   seqs.modal_index <- grep("^\\[Modal",seqs)
   seq_end <- grep("END",seqs)[1]
   seqs_sub <- seqs[(seqs.modal_index[1]+2):(seq_end-2)]
   seqs_sub.modal_index <- grep("^\\[Modal",seqs_sub)
   seqs.modal <- seqs[seqs.modal_index]
   ## Get the lines
   seqs_sub <- grep("^[[:alnum:]]",seqs_sub,value=T)
   ## Cut it up
   seqs_list_raw <-
     sapply(seq(seqs_sub),
            function(x)grep(".",
                            unlist(strsplit(x=seqs_sub[x],
                                            
split="[[:blank:]]")),value=TRUE))
   ## Put everything together
   seqs.unique <- unique(seqs_list_raw[1,])
   seqs_index <- 
sapply(seqs.unique,function(seq)grep(seq,seqs_list_raw[1,]))

   seqs_vect <- sapply(seqs.unique,function(seq)
                       paste(seqs_list_raw[2,seqs_index[,seq]],collapse=""))

   seqs_lengths <- sapply(seqs.unique,function(seq)
                          gsub(pattern="\\[|\\]", replacement="",
                               x=seqs_list_raw[3,max(seqs_index[,seq])]))
   if(raw.sequences)
     {
       seqs_raw <- sapply(seqs_vect,function(seq)
                          gsub("[[:punct:]]",replacement="",seq))
       return(seqs_raw)
     } else {
       if(return.metadata) {return(list(seqs.metadata, 
seqs_vect,seqs_lengths))
                          } else { print(seqs_lengths); 
print(seqs.metadata);
                                   seqs_vect}
     }

}

Sorry if it looks ugly... but it works

Cheers,
Brian



More information about the Bioconductor mailing list