[R] union data in column
Hadley Wickham
hadley at rice.edu
Sat Jul 24 14:53:23 CEST 2010
On Sat, Jul 24, 2010 at 2:23 AM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:
> Fahim Md wrote:
>>
>> Is there any function/way to merge/unite the following data
>>
>> GENEID col1 col2 col3 col4
>> G234064 1 0 0 0
>> G234064 1 0 0 0
>> G234064 1 0 0 0
>> G234064 0 1 0 0
>> G234065 0 1 0 0
>> G234065 0 1 0 0
>> G234065 0 1 0 0
>> G234065 0 0 1 0
>> G234065 0 0 1 0
>> G234065 0 0 0 1
>>
>>
>> into
>> GENEID col1 col2 col3 col4
>> G234064 1 1 0 0
>> // 1 appears in col1 and col2 above, rest are zero
>> G234065 0 1 1 1
>> // 1 appears in col2 , 3 and 4 above.
>>
>>
>> Thank
>
> Warning on terminology: there is a "merge" function in R that lines up rows
> from different tables to make a new set of longer rows (more columns). The
> usual term for combining column values from multiple rows is "aggregation".
>
> In addition to the example offered by Jim Holtzman, here are some other
> options in no particular order:
>
> x <- read.table(textConnection(" GENEID col1 col2 col3 col4
> G234064 1 0 0 0
> G234064 1 0 0 0
> G234064 1 0 0 0
> G234064 0 1 0 0
> G234065 0 1 0 0
> G234065 0 1 0 0
> G234065 0 1 0 0
> G234065 0 0 1 0
> G234065 0 0 1 0
> G234065 0 0 0 1
> "), header=TRUE, as.is=TRUE, row.names=NULL)
> closeAllConnections()
>
> # syntactic repackaging of Jim's basic approach
> library(plyr)
> ddply( x, .(GENEID), function(df)
> {with(as.integer(c(col1=any(col1),col2=any(col2),col3=any(col3),col4=any(col4))))}
> )
You can do this a little more succinctly with colwise:
any_1 <- function(x) as.integer(any(x))
ddply(x, "GENEID", numcolwise(any_1))
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
More information about the R-help
mailing list