[R] Consolidate column contents of equally "named" columns

David Winsemius dwinsemius at comcast.net
Sun Apr 29 15:05:25 CEST 2012


On Apr 28, 2012, at 10:46 AM, Daniel Malter wrote:

> Hi,
>
> I have a data frame whose first row (not the header) contains the true
> column names. The same column name can  occur multiple times in the  
> dataset.
> Columns with equal names are not adjacent, and for each observation  
> only one
> of the equally named columns contains the actual data (see the example
> below). I am looking for an easy method consolidate these columns  
> into one
> column for each unique column name. Say,
>
> x1<-c("x",1,NA,NA)
> x2<-c("x",NA,2,NA)
> x3<-c("x",NA,NA,3)
> y1<-c("y",3,NA,NA)
> y2<-c("y",NA,1,NA)
> y3<-c("y",NA,NA,2)
> d<-data.frame(x1,y1,x2,y2,x3,y3)
> d
>

It would avoid problems with manipulating factors it these were  
created (or converted to) character columns, choose one of:

d=data.frame(x1,y1,x2,y2,x3,y3, stringsAsFactors=FALSE)

d[]<-lapply(d, as.character)


> # d looks like:
>
>    x1   y1   x2   y2   x3   y3
> 1    x    y    x    y    x    y
> 2    1    3 <NA> <NA> <NA> <NA>
> 3 <NA> <NA>    2    1 <NA> <NA>
> 4 <NA> <NA> <NA> <NA>    3    2
>
>> From this, I want to create the table or data frame
>
> x y
> 1 3
> 2 1
> 3 2

na.omit(
     data.frame(
         X=stack(d[-1,grep("x", names(d))]),
         Y=stack(d[-1,grep("y", names(d))]),
         stringsAsFactors=FALSE)[ c(1,3) ])

   X.values Y.values
1        1        3
5        2        1
9        3        2


If it were less regular you might need to merge with the "source"  
columns that stack generates.


-- 
David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list