[R] Howto build combinations of colums of a data frame
David Winsemius
dwinsemius at comcast.net
Thu Apr 16 16:59:35 CEST 2009
On Apr 16, 2009, at 10:14 AM, Juergen Rose wrote:
> Hi,
>
> as a R-newcomer I would like to create some new data frames from a
> given
> data frame. The first new data frame should content all pairs of the
> columns of the original data frame. The second new data frame should
> content all tripels of of the columns of the original data frame and
> the
> last the quadrupel of columns. The values in the new data frames
> should
> be the product of two, three our four original single field values.
> For
> pairs and tripels I could realize that task, with the following R
> script:
>
> Lines <- "a b c d
> 13 0 15 16
> 23 24 25 0
> 33 34 0 36
> 0 44 45 46
> 53 54 0 55"
>
> DF <- read.table(textConnection(Lines), header = TRUE)
>
> nrow <-length(rownames(DF))
> cnames <- colnames(DF)
> nc <-length(DF)
>
> nc.pairs <- nc*(nc-1)/2
> # initialize vector
> cnames.new <- c(rep("",nc.pairs))
> ind <- 1
> print(sprintf("nc=%d",nc))
> for (i in 1:(nc-1)) {
> if (i+1 <= nc ) {
> for (j in (i+1):nc) {
> cnames.new[ind] <- paste(cnames[i],cnames[j],sep="")
> ind <- ind+1
> }
> }
> }
>
> ind <- 1
> # initialize data.frame
> pairs <- data.frame(matrix(c(rep(0,nc.pairs*nrow)),ncol=nc.pairs))
> for (i in 1:nc) {
> if (i+1 <= nc ) {
> for (j in (i+1):nc) {
> t <- DF[,i] * DF[,j]
> pairs[[ind]] <- t
> ind <- ind+1
> }
> }
> }
> colnames(pairs) <- cnames.new
> print("pairs="); print(pairs)
apply(combn(colnames(DF),2), 2, function(x) DF[,x[1]]*DF[,x[2]] )
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 195 208 0 0 240
[2,] 552 575 0 600 0 0
[3,] 1122 0 1188 0 1224 0
[4,] 0 0 0 1980 2024 2070
[5,] 2862 0 2915 0 2970 0
>
>
> nc.tripels <- nc*(nc-1)*(nc-2)/6
> # initialize vector
> cnames.new <- c(rep("",nc.tripels))
> ind <- 1
> print(sprintf("nc=%d",nc))
> for (i in 1:nc) {
> if (i+1 <= nc ) {
> for (j in (i+1):nc) {
> if (j+1 <= nc ) {
> for (k in (j+1):nc) {
> cnames.new[ind] <-
> paste(cnames[i],cnames[j],cnames[k],sep="")
> ind <- ind+1
> }
> }
> }
> }
> }
>
> ind <- 1
> # initialize data.frame
> tripels <-
> data.frame(matrix(c(rep(0,nc.tripels*nrow)),ncol=nc.tripels))
> for (i in 1:(nc-1)) {
> if (i+1 <= nc ) {
> for (j in (i+1):nc) {
> if (j+1 <= nc ) {
> for (k in (j+1):nc) {
> t <- DF[,i] * DF[,j] * DF[,k]
> tripels[[ind]] <- t
> ind <- ind+1
> }
> }
> }
> }
> }
> colnames(tripels) <- cnames.new
> print("tripels="); print(tripels)
> apply(combn(colnames(DF),3), 2, function(x)
DF[,x[1]]*DF[,x[2]]*DF[,x[3]])
[,1] [,2] [,3] [,4]
[1,] 0 0 3120 0
[2,] 13800 0 0 0
[3,] 0 40392 0 0
[4,] 0 0 0 91080
[5,] 0 157410 0 0
>
>
> I suppose that here is a much shorter way to get the same results. Any
> hint is very much appreciated.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list