[R] matching each row
Marc Schwartz
marc_schwartz at me.com
Wed Jul 8 20:10:32 CEST 2009
On Jul 8, 2009, at 10:09 AM, tathta wrote:
>
> I have two dataframes, the first column of each dataframe is a
> unique id
> number (the rest of the columns are data variables).
> I would like to figure out how many times each id number appears in
> each
> dataframe.
>
> So far I can use:
> length( match (dataframeA$unique.id[1], dataframeB$unique.id) )
>
> but this only works on each row of dataframe A one-at-a-time.
>
> I would like to do this for all of the rows in dataframe A, and then
> put the
> results in a new variable: dataframeA$count
>
>
> I'm new to R, so please be patient with me!
>
>
> Sorry if this question has already been answered, my search of the
> archives
> only brought up one relevant post, and I didn't understand the
> answer to
> it.... http://www.nabble.com/match-to20799206.html#a20799206
If I am correctly understanding what you are looking for, you could do
something like the following:
# Create some simple data. Note that only a subset of the ID's (3:5)
will match across the two DF's:
set.seed(1)
DF.A <- data.frame(ID = sample(1:5, 10, replace = TRUE))
DF.B <- data.frame(ID = sample(3:7, 10, replace = TRUE))
> DF.A
ID
1 2
2 2
3 3
4 5
5 2
6 5
7 5
8 4
9 4
10 1
> DF.B
ID
1 4
2 3
3 6
4 4
5 6
6 5
7 6
8 7
9 4
10 6
Now, create counts of the IDs in each, coercing the results to data
frames and setting the count column name for each:
TAB.A <- as.data.frame(table(DF.A$ID), responseName = "Count.A")
TAB.B <- as.data.frame(table(DF.B$ID), responseName = "Count.B")
> TAB.A
Var1 Count.A
1 1 1
2 2 3
3 3 1
4 4 2
5 5 3
> TAB.B
Var1 Count.B
1 3 1
2 4 3
3 5 1
4 6 4
5 7 1
Now, use merge() to join each of the two above. 'all = TRUE' will
include non-matching keys:
> merge(TAB.A, TAB.B, by = "Var1", all = TRUE)
Var1 Count.A Count.B
1 1 1 NA
2 2 3 NA
3 3 1 1
4 4 2 3
5 5 3 1
6 6 NA 4
7 7 NA 1
Note that you will get NAs for any non-matching ID's (Var1).
See ?table, ?as.data.frame and ?merge for more information.
HTH,
Marc Schwartz
More information about the R-help
mailing list