[R] Compare two data sets
David Winsemius
dwinsemius at comcast.net
Wed Mar 26 02:50:59 CET 2008
<amarkey at uiuc.edu> wrote in
news:20080325101909.BDK93111 at expms2.cites.uiuc.edu:
> I would like to compare two data sets saved as text files (example
> below) to determine if both sets are identical(or if dat2 is missing
> information that is included in dat1) and if they are not identical
> list what information is different between the two sets(ie output
> "a1", "a3" as the differing information). The overall purpose would
> be to remove "a1" and "a3" from dat 1 so both dat1 and dat2 are the
> same. My R abilities are somewhat limited so any suggestions are
> greatly appreciated.
I do not understand what it would mean to remove elements so "they
would look the same". Why wouldn't you just use the smaller set?
>
> Alysta
>
> dat1
> a1
> a2
> a3
> a4
> a5
> a6
>
> dat2
> a2
> a4
> a5
> a6
You might want to look at the %in% function. These examples created
with neither dat1 nor dat2 being proper subsets of the other.
dat1 <- paste('a', 1:6, sep='')
dat2 <- paste('a', c(2,4:6,8,9,10), sep='')
> dat1
[1] "a1" "a2" "a3" "a4" "a5" "a6"
> dat2
[1] "a2" "a4" "a5" "a6" "a8" "a9" "a10"
dat2 %in% dat1
#[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE
dat1 %in% dat2
#[1] FALSE TRUE FALSE TRUE TRUE TRUE
### And then use the logical vectors as index arguments
### to first get the common elements
> dat1[dat1 %in% dat2]
[1] "a2" "a4" "a5" "a6"
> dat2[dat2 %in% dat1]
[1] "a2" "a4" "a5" "a6"
### And then to find the non-shared elements
> dat2[!(dat2 %in% dat1)]
[1] "a8" "a9" "a10"
> dat1[!(dat1 %in% dat2)]
[1] "a1" "a3"
--
David Winsemius
More information about the R-help
mailing list