[R] Need a vectorized way to avoid two nested FOR loops

Thu Oct 8 15:28:25 CEST 2009

Another approach is:

n <- 20
set.seed(2)
x <- as.data.frame(matrix(sample(1:2, n*6, TRUE), nrow = n))
x.col <- c(1, 3, 5)

values <- do.call(paste, c(x[x.col], sep = "\r"))
out <- lapply(seq_along(ind), function (i) {
     ind <- which(values == values[i])
     ind[!ind %in% i]
})
out

Best,
Dimitris

joris meys wrote:
> Neat piece of code, Jim, but it still uses a nested loop. If you order
> the matrix first, you only need one passage through the whole matrix
> to find the information you need.
> 
> Off course I don't take into account the ordering. If the ordering
> algorithm doesn't work in linear time, then it doesn't really matter I
> guess. The limiting step would become the ordering algorithm.
> 
> Kind regards
> Joris
> 
> 
> 
> On Thu, Oct 8, 2009 at 2:24 PM, jim holtman <jholtman at gmail.com> wrote:
>> I answered the wrong question.  Here is the code to find all the
>> matches for each row:
>>
>> n <- 20
>> set.seed(2)
>> # create test dataframe
>> x <- as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n))
>> x
>> x.col <- c(1,3,5)
>>
>> # match against all the other rows
>> x.match1 <- apply(x[, x.col], 1, function(a){
>>    .mat <- which(apply(x[, x.col], 1, function(z){
>>        all(a == z)
>>    }))
>> })
>>
>> # remove matches to itself
>> x.match2 <- lapply(seq(length(x.match1)), function(z){
>>    x.match1[[z]][!(x.match1[[z]] %in% z)]
>> })
>> # x.match2 contains which rows indices match
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Oct 7, 2009 at 3:52 PM, Rama Ramakrishnan <rama at alum.mit.edu> wrote:
>>> Hi Friends,
>>>
>>> I have a data frame d. Let vars be the column indices for a subset of the
>>> columns in d (e.g., vars <- c(1,3,4,8))
>>>
>>> For each row r in d, I want to collect all the other rows in d that match
>>> the values in row r for just the columns in vars.
>>>
>>> The naive way to do this is to have a for loop stepping through each row in
>>> d, and within the loop have another loop going through all the rows again,
>>> checking for equality. This is quadratic in the number of rows and takes way
>>> too long. Is there a better, "vectorized" way to do this?
>>>
>>> Thanks in advance!
>>>
>>> Rama Ramakrishnan
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014