[R] identify duplicate from more than one column

jour4life jour4life at gmail.com
Sun Nov 13 05:16:03 CET 2011


Hi all,

I've searched everywhere to try to find out how to do this and have had no
luck. I am trying to construct identifiers for couples in a dataset.
Essentially, I want to identify couples using more than one column as
identifiers. Take for instance:

obs	unit	        home       z 	sex	age
1	015029	18	       1	1	053
2	015029	18	       1	2	049
3	015029	01	       1	1	038
4	015029	01	       1	2	033
5	015029	02	       1	1	036
6	015029	02	       1	2	033
7	015029	03	       1	1	023
8	015029	03	       1	2	019
9	015029	04	       1	2	045
10	015029	05	       1	2	047

Where unit is the housing unit, home is household. Of course, there are more
values for unit, although these first ten observations consist of the same
unit (which could possibly be an apartment complex). Nonetheless, I want to
construct an identifier for couples if unit, home match, but only if both
male and female are within the same household. Taking the example data
above, I want to see this:

	unit	        home	z	sex	age      couple
1	015029	18	       1	1	053      1
2	015029	18	       1	2	049      1
3	015029	01	       1	1	038      2
4	015029	01	       1	2	033      2
5	015029	02	       1	1	036      3
6	015029	02	       1	2	033      3
7	015029	03	       1	1	023      4
8	015029	03	       1	2	019      4
9	015029	04	       1	2	045      0
10	015029	05	       1	2	047      0

As you can see in the last two observations, there were no males identified
within the same household, thus the last two observations would not contain
couple identifiers, rather some other identifier (but the same one) so I can
detect them and remove them later. I've tried using the duplicated function
but was not very useful.

Any help would be greatly appreciated!!! 

Thanks,

Carlos

--
View this message in context: http://r.789695.n4.nabble.com/identify-duplicate-from-more-than-one-column-tp4035888p4035888.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list