[R] Removing & generating data by category

Fri Oct 30 10:04:33 CET 2009

Hmm, so if read correctly you want to remove exactly duplicated rows. So 
maybe try the following to begin with.

  duplicated(newdf[ , c("id", "loc", "clm")])
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE 
TRUE TRUE

Then you can remove the duplicated rows before proceeding with what has 
been suggested before.

Also you can try unique(newdf[ , c("id", "loc", "clm")]) if you are not 
interested in carrying over other corresponding variables.

See help(duplicated) and help(unique).

Regards, Adai

David Winsemius wrote:
> Color me puzzled. Can you express the run more clearly in Boolean logic?
> 
> If someone has five policies: 3 Life and 2 General ...  is he in or out?
> 
> Applying the alternate strategy to that data set I get:
> out <- tapply( dat$clm, dat$uid, paste ,collapse=",")
>  >
>  > out
>                           A1.B1                           
> A2.B2                          A3.B1
>                       "General"                  
> "General,Life"                      "General"
>                           A3.B3                           
> A4.B4                          A5.B5
> "General,Life,General,General"          
> "General,Life,General"                 "General,Life"
> 
> Please explain why you want A3.B3.
>