[R] Removing & generating data by category
Adaikalavan Ramasamy
a.ramasamy at imperial.ac.uk
Thu Oct 29 12:30:43 CET 2009
Here is another way based on pasting ids as hinted below:
a <- data.frame(id=c(c("A1","A2","A3","A4","A5"),
c("A3","A2","A3","A4","A5")),
loc=c("B1","B2","B3","B4","B5"),
clm=c(rep(("General"),6),rep("Life",4)))
a$uid <- paste(a$id, ".", a$loc, sep="")
out <- tapply( a$clm, a$uid, paste ) # can also add collapse=","
$A1.B1
[1] "General"
$A2.B2
[1] "General" "Life"
$A3.B1
[1] "General"
$A3.B3
[1] "General" "Life"
$A4.B4
[1] "General" "Life"
$A5.B5
[1] "General" "Life"
Then here are those with single policies.
> out[ which( sapply(out, length) == 1 ) ]
$A1.B1
[1] "General"
$A3.B1
[1] "General"
David Winsemius wrote:
> On Oct 28, 2009, at 9:30 PM, Steven Kang wrote:
>
>> Dear R users,
>>
>>
>> Basically, from the following arbitrary data set:
>>
>> a <-
>> data
>> .frame
>> (id
>> =
>> c
>> (c
>> ("A1
>> ","A2
>> ","A3
>> ","A4
>> ","A5
>> "),c
>> ("A3
>> ","A2
>> ","A3
>> ","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"),
>> 6),rep("Life",4)))
>>
>>> a
>> id loc clm
>> 1 A1 B1 General
>> 2 A2 B2 General
>> 3 A3 B3 General
>> 4 A4 B4 General
>> 5 A5 B5 General
>> 6 A3 B1 General
>> 7 A2 B2 Life
>> 8 A3 B3 Life
>> 9 A4 B4 Life
>> 10 A5 B5 Life
>>
>> I desire removing records (highlighted records above) with identical
>> values
>> in each fields ("id" & "loc") but with different value of "clm" (i.e
>> according to category)
>
> Take a look at this merge operation on separate rows of "a".
>
> > merge( a[a$clm=="Life", ], a[a$clm=="General", ] , by=c("id",
> "loc"), all=T)
> id loc clm.x clm.y
> 1 A1 B1 <NA> General
> 2 A2 B2 Life General
> 3 A3 B1 <NA> General
> 4 A3 B3 Life General
> 5 A4 B4 Life General
> 6 A5 B5 Life General
>
> Assignment of that object and selection with is.na should complete the
> process.
>
> > a2m <- merge( a[a$clm=="Life", ], a[a$clm=="General", ] ,
> by=c("id", "loc"), all=T)
>
> > a2m[ is.na(a2m$clm.x) | is.na(a2m$clm.y), ]
> id loc clm.x clm.y
> 1 A1 B1 <NA> General
> 3 A3 B1 <NA> General
>
> Alternate methods might include paste-ing id to loc and removing
> duplicates.
>
>
>> i.e
>>> categ <- table(a$id,a$clm)
>>> categ
>> General Life
>> A1 1 0
>> A2 1 1
>> A3 2 1
>> A4 1 1
>> A5 1 1
>>
>> The desired output is
>>
>> id loc clm
>> 1 A1 B1 General
>> 6 A3 B1 General
>>
>> Because the data set I am working on is quite big (~ 800,000 x 20)
>> with majority of the fields values being long strings, looping
>> turned out to
>> be very inefficient in comapring individual rows..
>>
>> Are there any alternative efficient methods in implementing this
>> problem?
>> Steven
More information about the R-help
mailing list