[R] Subset dataframe with loop searching for unique values in two columns
arun
smartpink111 at yahoo.com
Sun Jun 9 01:47:45 CEST 2013
Hi,
You could try this:
dat2<- read.table(text='
case pin some_data
"A" "1" "data"
"A" "2" "data"
"A" "1" "data"
"A" "2" "data"
"B" "1" "data"
"B" "2" "data"
',sep="",header=TRUE,stringsAsFactors=FALSE)
dat2[!duplicated(dat2[,1:2]),]
# case pin some_data
#1 A 1 data
#2 A 2 data
#5 B 1 data
#6 B 2 data
#or
dat2[row.names(unique(dat2[,1:2])),] ##assuming that the third column is different for the duplicated `case` and `pin`
# case pin some_data
#1 A 1 data
#2 A 2 data
#5 B 1 data
#6 B 2 data
#If `some_data` is same for duplicated rows:
unique(dat2)
# case pin some_data
#1 A 1 data
#2 A 2 data
#5 B 1 data
#6 B 2 data
A.K.
Hello,
First off, I'm sure that this is posted somewhere but I've not
been able to find what I'm looking for. Please forgive the duplication
and thank you for your help!!!!
I have a crime dataset of over 500k observations in one file. To
simplify my problem, I have a dataframe that has a "case" ID in one
column, a personal ID number (pin) in another, and associated "data" in
subsequent columns.
Example:
case pin some_data
[1,] "A" "1" "data"
[2,] "A" "2" "data"
[3,] "A" "1" "data"
[4,] "A" "2" "data"
[5,] "B" "1" "data"
[6,] "B" "2" "data"
I would like to subset the data so that only unique PINs and CASES are left with the subsequent data
case pin some_data
[1,] "A" "1" "data"
[2,] "A" "2" "data"
[5,] "B" "1" "data"
[6,] "B" "2" "data"
I'm teaching my self how to program in R and I'm thinking that I want a loop to say something like:
- find and keep first row of unique PIN & CASE
- if PIN is duplicate but CASE is different, keep first row of dupe PIN & new CASE
Longer Explanation:
The PIN identifies an arrested offender. I want to check and see if
there was recidivism, repeat offenses and arrests, for each
offender/PIN. The way I can do that is by checking whether a PIN has
multiple CASE numbers. I also want to keep the single arrests in the
dataset too. I have over 6 million cases for several years.
I hope this makes sense, I've been banging my head for a while on this one and really would appreciate the help!!
More information about the R-help
mailing list