[R] Choose between duplicated rows
Tyler Rinker
tyler_rinker at hotmail.com
Sat Apr 14 22:15:36 CEST 2012
My solution:
SP <- split(df, df[, 1:2])
minner <- function(x, col = 'numMiss') { x[which.min(unlist(x[,col])), , drop=FALSE]}
NEW <- do.call('rbind', lapply(SP, minner))SP2 <- split(NEW, NEW[, 'id'])do.call('rbind', lapply(SP2, function(x) minner(x, 'A')))
Cheers,Tyler
> Date: Sat, 14 Apr 2012 12:03:36 -0700
> From: francy.casalino at gmail.com
> To: r-help at r-project.org
> Subject: [R] Choose between duplicated rows
>
> Dear r experts,
>
> Sorry for this basic question, but I can't seem to find a solution…
>
> I have this data frame:
> df <- data.frame(id = c("id1", "id1", "id1", "id2", "id2", "id2"), A =
> c(11905, 11907, 11907, 11829, 11829, 11829), v1 = c(NA, 3, NA,1,2,NA), v2 =
> c(NA,2,NA, 2, NA,NA), v3 = c(NA,1,NA,1,NA,NA), v4 = c("N", "Y", "N", "Y",
> "N","N"), v5 = c(0,0,0,1,0,0), numMiss=c(3,0,3,0,2,3))
>
> > df
> id A v1 v2 v3 v4 v5 numMiss
> 1 id1 11905 NA NA NA N 0 3
> 2 id1 11907 3 2 1 Y 0 0
> 3 id1 11907 NA NA NA N 0 3
> 4 id2 11829 1 2 1 Y 1 0
> 5 id2 11829 2 NA NA N 0 2
> 6 id2 11829 NA NA NA N 0 3
>
>
> And I need to keep, of the rows that have the same value for "A" by id, only
> the ones with the least amount of missing values for all the variables (with
> min(numMiss)) to get this:
>
> id A v1 v2 v3 v4 v5 numMiss
> 1 id1 11905 NA NA NA N 0 3
> 2 id1 11907 3 2 1 Y 0 0
> 4 id2 11829 1 2 1 Y 1 0
>
> Then I have to choose the records with the least value of "A" of the rows
> that have the same id like this:
> id A v1 v2 v3 v4 v5 numMiss
> 1 id1 11905 NA NA NA N 0 3
> 4 id2 11829 1 2 1 Y 1 0
>
> For groupings I have used the package "plyr" before, but this would involve
> a sort of double-grouping by id and by duplicated values of A…Could you
> please help me understand how this can be done?
>
> Thank you very much.
> -f
>
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Choose-between-duplicated-rows-tp4557833p4557833.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list