[R] Howto build combinations of colums of a data frame

David Winsemius dwinsemius at comcast.net
Thu Apr 16 16:59:35 CEST 2009


On Apr 16, 2009, at 10:14 AM, Juergen Rose wrote:

> Hi,
>
> as a R-newcomer I would like to create some new data frames from a  
> given
> data frame. The first new data frame should content all pairs of the
> columns of the original data frame. The second new data frame should
> content all tripels of of the columns of the original data frame and  
> the
> last the quadrupel of columns. The values in the new data frames  
> should
> be the product of two, three our four original single field values.  
> For
> pairs and tripels I could realize that task, with the following R
> script:
>
> Lines <- "a    b    c    d
>    13     0    15   16
>    23    24    25    0
>    33    34     0   36
>     0    44    45   46
>    53    54     0   55"
>
> DF <- read.table(textConnection(Lines), header = TRUE)
>
> nrow <-length(rownames(DF))
> cnames <- colnames(DF)
> nc <-length(DF)
>
> nc.pairs <- nc*(nc-1)/2
> #  initialize vector
> cnames.new <- c(rep("",nc.pairs))
> ind <- 1
> print(sprintf("nc=%d",nc))
> for (i in 1:(nc-1)) {
>  if (i+1 <= nc ) {
>    for (j in (i+1):nc) {
>      cnames.new[ind] <- paste(cnames[i],cnames[j],sep="")
>      ind <- ind+1
>    }
>  }
> }
>
> ind <- 1
> #  initialize data.frame
> pairs <- data.frame(matrix(c(rep(0,nc.pairs*nrow)),ncol=nc.pairs))
> for (i in 1:nc) {
>  if (i+1 <= nc ) {
>    for (j in (i+1):nc) {
>      t <- DF[,i] * DF[,j]
>      pairs[[ind]] <- t
>      ind <- ind+1
>    }
>  }
> }
> colnames(pairs) <- cnames.new
> print("pairs=");   print(pairs)

apply(combn(colnames(DF),2), 2, function(x) DF[,x[1]]*DF[,x[2]] )
      [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0  195  208    0    0  240
[2,]  552  575    0  600    0    0
[3,] 1122    0 1188    0 1224    0
[4,]    0    0    0 1980 2024 2070
[5,] 2862    0 2915    0 2970    0
>
>
> nc.tripels <- nc*(nc-1)*(nc-2)/6
> #  initialize vector
> cnames.new <- c(rep("",nc.tripels))
> ind <- 1
> print(sprintf("nc=%d",nc))
> for (i in 1:nc) {
>  if (i+1 <= nc ) {
>    for (j in (i+1):nc) {
>      if (j+1 <= nc ) {
>        for (k in (j+1):nc) {
>          cnames.new[ind] <-  
> paste(cnames[i],cnames[j],cnames[k],sep="")
>          ind <- ind+1
>        }
>      }
>    }
>  }
> }
>
> ind <- 1
> #  initialize data.frame
> tripels <-  
> data.frame(matrix(c(rep(0,nc.tripels*nrow)),ncol=nc.tripels))
> for (i in 1:(nc-1)) {
>  if (i+1 <= nc ) {
>    for (j in (i+1):nc) {
>      if (j+1 <= nc ) {
>        for (k in (j+1):nc) {
>          t <- DF[,i] * DF[,j] * DF[,k]
>          tripels[[ind]] <- t
>          ind <- ind+1
>        }
>      }
>    }
>  }
> }
> colnames(tripels) <-  cnames.new
> print("tripels=");   print(tripels)

 > apply(combn(colnames(DF),3), 2, function(x)  
DF[,x[1]]*DF[,x[2]]*DF[,x[3]])
       [,1]   [,2] [,3]  [,4]
[1,]     0      0 3120     0
[2,] 13800      0    0     0
[3,]     0  40392    0     0
[4,]     0      0    0 91080
[5,]     0 157410    0     0

>
>
> I suppose that here is a much shorter way to get the same results. Any
> hint is very much appreciated.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list