[R] Need a vectorized way to avoid two nested FOR loops
jim holtman
jholtman at gmail.com
Thu Oct 8 14:04:34 CEST 2009
Here is one way of doing it:
> n <- 20
> set.seed(2)
> # create test dataframe
> x <- as.data.frame(matrix(sample(1:2,n*6, TRUE), nrow=n))
> x
V1 V2 V3 V4 V5 V6
1 1 2 2 2 1 1
2 2 1 1 2 2 1
3 2 2 1 2 1 2
4 1 1 1 1 1 2
5 2 1 2 2 1 1
6 2 1 2 1 2 2
7 1 1 2 1 2 2
8 2 1 1 1 1 1
9 1 2 2 1 2 1
10 2 1 2 1 1 1
11 2 1 1 1 2 1
12 1 1 1 1 1 2
13 2 2 2 1 1 1
14 1 2 2 1 2 2
15 1 2 1 1 1 2
16 2 2 2 2 1 2
17 2 2 2 1 1 2
18 1 1 2 2 1 1
19 1 2 2 1 1 2
20 1 1 2 2 1 2
> x.col <- c(1,3,5)
> # find matching columns by testing the first against all others
> x.match <- x[, x.col[1]] == x[, x.col[-1]]
> # print them out
> x[apply(x.match, 1, all),]
V1 V2 V3 V4 V5 V6
4 1 1 1 1 1 2
6 2 1 2 1 2 2
12 1 1 1 1 1 2
15 1 2 1 1 1 2
>
>
>
On Wed, Oct 7, 2009 at 3:52 PM, Rama Ramakrishnan <rama at alum.mit.edu> wrote:
>
> Hi Friends,
>
> I have a data frame d. Let vars be the column indices for a subset of the
> columns in d (e.g., vars <- c(1,3,4,8))
>
> For each row r in d, I want to collect all the other rows in d that match
> the values in row r for just the columns in vars.
>
> The naive way to do this is to have a for loop stepping through each row in
> d, and within the loop have another loop going through all the rows again,
> checking for equality. This is quadratic in the number of rows and takes way
> too long. Is there a better, "vectorized" way to do this?
>
> Thanks in advance!
>
> Rama Ramakrishnan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list