[R] Howto build combinations of colums of a data frame
Juergen Rose
rose at rz.uni-potsdam.de
Thu Apr 16 18:33:07 CEST 2009
Am Donnerstag, den 16.04.2009, 10:59 -0400 schrieb David Winsemius:
Thanks David,
is there also a shorter way to get the columns names of the new data
frames?
Juergen
> On Apr 16, 2009, at 10:14 AM, Juergen Rose wrote:
>
> > Hi,
> >
> > as a R-newcomer I would like to create some new data frames from a
> > given
> > data frame. The first new data frame should content all pairs of the
> > columns of the original data frame. The second new data frame should
> > content all tripels of of the columns of the original data frame and
> > the
> > last the quadrupel of columns. The values in the new data frames
> > should
> > be the product of two, three our four original single field values.
> > For
> > pairs and tripels I could realize that task, with the following R
> > script:
> >
> > Lines <- "a b c d
> > 13 0 15 16
> > 23 24 25 0
> > 33 34 0 36
> > 0 44 45 46
> > 53 54 0 55"
> >
> > DF <- read.table(textConnection(Lines), header = TRUE)
> >
> > nrow <-length(rownames(DF))
> > cnames <- colnames(DF)
> > nc <-length(DF)
> >
> > nc.pairs <- nc*(nc-1)/2
> > # initialize vector
> > cnames.new <- c(rep("",nc.pairs))
> > ind <- 1
> > print(sprintf("nc=%d",nc))
> > for (i in 1:(nc-1)) {
> > if (i+1 <= nc ) {
> > for (j in (i+1):nc) {
> > cnames.new[ind] <- paste(cnames[i],cnames[j],sep="")
> > ind <- ind+1
> > }
> > }
> > }
> >
> > ind <- 1
> > # initialize data.frame
> > pairs <- data.frame(matrix(c(rep(0,nc.pairs*nrow)),ncol=nc.pairs))
> > for (i in 1:nc) {
> > if (i+1 <= nc ) {
> > for (j in (i+1):nc) {
> > t <- DF[,i] * DF[,j]
> > pairs[[ind]] <- t
> > ind <- ind+1
> > }
> > }
> > }
> > colnames(pairs) <- cnames.new
> > print("pairs="); print(pairs)
>
> apply(combn(colnames(DF),2), 2, function(x) DF[,x[1]]*DF[,x[2]] )
> [,1] [,2] [,3] [,4] [,5] [,6]
> [1,] 0 195 208 0 0 240
> [2,] 552 575 0 600 0 0
> [3,] 1122 0 1188 0 1224 0
> [4,] 0 0 0 1980 2024 2070
> [5,] 2862 0 2915 0 2970 0
> >
> >
> > nc.tripels <- nc*(nc-1)*(nc-2)/6
> > # initialize vector
> > cnames.new <- c(rep("",nc.tripels))
> > ind <- 1
> > print(sprintf("nc=%d",nc))
> > for (i in 1:nc) {
> > if (i+1 <= nc ) {
> > for (j in (i+1):nc) {
> > if (j+1 <= nc ) {
> > for (k in (j+1):nc) {
> > cnames.new[ind] <-
> > paste(cnames[i],cnames[j],cnames[k],sep="")
> > ind <- ind+1
> > }
> > }
> > }
> > }
> > }
> >
> > ind <- 1
> > # initialize data.frame
> > tripels <-
> > data.frame(matrix(c(rep(0,nc.tripels*nrow)),ncol=nc.tripels))
> > for (i in 1:(nc-1)) {
> > if (i+1 <= nc ) {
> > for (j in (i+1):nc) {
> > if (j+1 <= nc ) {
> > for (k in (j+1):nc) {
> > t <- DF[,i] * DF[,j] * DF[,k]
> > tripels[[ind]] <- t
> > ind <- ind+1
> > }
> > }
> > }
> > }
> > }
> > colnames(tripels) <- cnames.new
> > print("tripels="); print(tripels)
>
> > apply(combn(colnames(DF),3), 2, function(x)
> DF[,x[1]]*DF[,x[2]]*DF[,x[3]])
> [,1] [,2] [,3] [,4]
> [1,] 0 0 3120 0
> [2,] 13800 0 0 0
> [3,] 0 40392 0 0
> [4,] 0 0 0 91080
> [5,] 0 157410 0 0
>
> >
> >
> > I suppose that here is a much shorter way to get the same results. Any
> > hint is very much appreciated.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
More information about the R-help
mailing list