Erik,
Judging from your data, I would gather that you are not interested in
indels. Is that correct? You should look at the neditStartingAt
function. Something like the following may meet your needs:
N <- length(myStrings)
myDists <- matrix(0, nrow = N, ncol = N)
for (i in 1:(N-1))
for (j in (i+1):N)
myDists[i, j] <- myDists[j, i] <-
neditStartingAt(myStrings[[i]], myStrings[[j]])
Patrick
On 3/25/10 2:57 PM, erikwright@comcast.net wrote:
> I have 500 DNAStrings, all of length 8000. I need the entire N x N
> distance matrix.
> Thanks,
> Erik
> ----- Original Message -----
> From: "Patrick Aboyoun"
> To: erikwright@comcast.net
> Cc: bioconductor@stat.math.ethz.ch
> Sent: Thursday, March 25, 2010 4:45:29 PM GMT -06:00 US/Canada Central
> Subject: Re: [BioC] Count differences between sequences
> Erik,
> Could you provide more details on your data? How long are each of the
> strings and how many strings do you have? Also, do you need the entire
> N x N distance matrix for downstream analysis or are you just looking
> for closest relatives?
>
> Patrick
> On 3/25/10 2:29 PM, erikwright@comcast.net wrote:
> > Hello all,
> >
> > I have a large DNAStringSet and I am trying to calculate its
> distance matrix. My DNAStrings are equal width and they are already
> aligned.
> >
> > I have tried using the stringDist() function, but it is very slow
> for large DNAStringSets. Is there a way to quickly calculate the
> number of differences between two DNAString instances?
> >
> > For example, let's say I have two DNAStrings: "ACAC" and "ACAG". I
> would like to know if their is a function other than stringDist() that
> will tell me the distance between them is 1.
> >
> > Thanks in advance for any help.
> >
> > - Erik
