[R] Removing & generating data by category

jim holtman jholtman at gmail.com
Thu Oct 29 03:03:41 CET 2009


Here is one way of doing it:

> a <-
+ data.frame(id=c(c("A1","A2","A3","A4","A5"),
+     c("A3","A2","A3","A4","A5")),loc=c("B1","B2","B3","B4","B5"),
+     clm=c(rep(("General"),6),rep("Life",4)))
> # split the indices based on 'id' & 'loc'
> a.indx <- split(seq(nrow(a)), paste(a$id, a$loc))
> # now take each group and see if 'clm' differs (don't know what you want to
> # do if more than 2 are in the group)
> result <- lapply(a.indx, function(.indx){
+     if (length(.indx) == 1) return(.indx)
+     if (any(a$clm[.indx[1]] != a$clm[.indx])) return(NULL)
+     .indx
+ })
> # output the matches
> a[unlist(result),,drop=FALSE]
  id loc     clm
1 A1  B1 General
6 A3  B1 General
>
>


On Wed, Oct 28, 2009 at 9:30 PM, Steven Kang <stochastickang at gmail.com> wrote:
> Dear R users,
>
>
> Basically, from the following arbitrary data set:
>
> a <-
> data.frame(id=c(c("A1","A2","A3","A4","A5"),c("A3","A2","A3","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"),6),rep("Life",4)))
>
>> a
>    id   loc  clm
> 1  A1  B1 General
> 2  A2  B2 General
> 3  A3  B3 General
> 4  A4  B4 General
> 5  A5  B5 General
> 6  A3  B1 General
> 7  A2  B2    Life
> 8  A3  B3    Life
> 9  A4  B4    Life
> 10 A5  B5    Life
>
> I desire removing records (highlighted records above) with identical values
> in each fields ("id" & "loc") but with different value of "clm" (i.e
> according to category)
> i.e
>> categ <- table(a$id,a$clm)
>> categ
>
>     General Life
>  A1       1    0
>  A2       1    1
>  A3       2    1
>  A4       1    1
>  A5       1    1
>
> The desired output is
>
>    id   loc  clm
> 1  A1  B1 General
> 6  A3  B1 General
>
> Because the data set I am working on is quite big (~ 800,000 x 20)
> with majority of the fields values being long strings, looping turned out to
> be very inefficient in comapring individual rows..
>
> Are there any alternative efficient methods in implementing this problem?
>
> Greatly appreciate for your expertise.
>
>
>
> Steven
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list