[R] counting sequence mismatches

Martin Morgan mtmorgan at fhcrc.org
Sat Feb 23 03:41:41 CET 2008


One kind of ugly solution

 > d.f=data.frame(seq1, seq2, stringsAsFactors=FALSE)
 > d.f[["nMismatch"]] <- with(d.f, {
+   m <- mapply("!=", strsplit(seq1, ""), strsplit(seq2, ""))
+   colSums(m)
+ })

Check out the Bioconductor Biostrings package, especially the version 
available with the development version of R, for DNA string algorithms.

Martin

joseph wrote:
> Hello
> I have 2 columns of short sequences that I would like to compare and count the number of mismatches and  record the  number of mismatches in a new column. The sequences are part of a data frame that looks like this:
> seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
> seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT", "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
> d.f=data.frame(seq1, seq2)
> thank you for your help
> Joseph
> 
> 
> 
> 
> 
> 
>       ____________________________________________________________________________________
> Looking for last minute shopping deals?  
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list