[R] Howto build combinations of colums of a data frame
David Winsemius
dwinsemius at comcast.net
Thu Apr 16 19:17:45 CEST 2009
Those are not actually dataframes. They are matrices. If you want to
make them into dataframes, use a coercive function. The names can be
generated from the original column names using the same construction
as the column creation:
> apply(combn(colnames(DF),2), 2, paste, collapse="*")
[1] "a*b" "a*c" "a*d" "b*c" "b*d" "c*d"
> apply(combn(colnames(DF),3), 2, paste, collapse="*")
[1] "a*b*c" "a*b*d" "a*c*d" "b*c*d"
--
David
On Apr 16, 2009, at 12:33 PM, Juergen Rose wrote:
> Am Donnerstag, den 16.04.2009, 10:59 -0400 schrieb David Winsemius:
>
> Thanks David,
>
> is there also a shorter way to get the columns names of the new data
> frames?
>
> Juergen
>
>> On Apr 16, 2009, at 10:14 AM, Juergen Rose wrote:
>>
>>> Hi,
>>>
>>> as a R-newcomer I would like to create some new data frames from a
>>> given
>>> data frame. The first new data frame should content all pairs of the
>>> columns of the original data frame. The second new data frame should
>>> content all tripels of of the columns of the original data frame and
>>> the
>>> last the quadrupel of columns. The values in the new data frames
>>> should
>>> be the product of two, three our four original single field values.
>>> For
>>> pairs and tripels I could realize that task, with the following R
>>> script:
>>>
>>> Lines <- "a b c d
>>> 13 0 15 16
>>> 23 24 25 0
>>> 33 34 0 36
>>> 0 44 45 46
>>> 53 54 0 55"
>>>
>>> DF <- read.table(textConnection(Lines), header = TRUE)
>>>
>>> nrow <-length(rownames(DF))
>>> cnames <- colnames(DF)
>>> nc <-length(DF)
>>>
>>> nc.pairs <- nc*(nc-1)/2
>>> # initialize vector
>>> cnames.new <- c(rep("",nc.pairs))
>>> ind <- 1
>>> print(sprintf("nc=%d",nc))
>>> for (i in 1:(nc-1)) {
>>> if (i+1 <= nc ) {
>>> for (j in (i+1):nc) {
>>> cnames.new[ind] <- paste(cnames[i],cnames[j],sep="")
>>> ind <- ind+1
>>> }
>>> }
>>> }
>>>
>>> ind <- 1
>>> # initialize data.frame
>>> pairs <- data.frame(matrix(c(rep(0,nc.pairs*nrow)),ncol=nc.pairs))
>>> for (i in 1:nc) {
>>> if (i+1 <= nc ) {
>>> for (j in (i+1):nc) {
>>> t <- DF[,i] * DF[,j]
>>> pairs[[ind]] <- t
>>> ind <- ind+1
>>> }
>>> }
>>> }
>>> colnames(pairs) <- cnames.new
>>> print("pairs="); print(pairs)
>>
>> apply(combn(colnames(DF),2), 2, function(x) DF[,x[1]]*DF[,x[2]] )
>> [,1] [,2] [,3] [,4] [,5] [,6]
>> [1,] 0 195 208 0 0 240
>> [2,] 552 575 0 600 0 0
>> [3,] 1122 0 1188 0 1224 0
>> [4,] 0 0 0 1980 2024 2070
>> [5,] 2862 0 2915 0 2970 0
>>>
>>>
>>> nc.tripels <- nc*(nc-1)*(nc-2)/6
>>> # initialize vector
>>> cnames.new <- c(rep("",nc.tripels))
>>> ind <- 1
>>> print(sprintf("nc=%d",nc))
>>> for (i in 1:nc) {
>>> if (i+1 <= nc ) {
>>> for (j in (i+1):nc) {
>>> if (j+1 <= nc ) {
>>> for (k in (j+1):nc) {
>>> cnames.new[ind] <-
>>> paste(cnames[i],cnames[j],cnames[k],sep="")
>>> ind <- ind+1
>>> }
>>> }
>>> }
>>> }
>>> }
>>>
>>> ind <- 1
>>> # initialize data.frame
>>> tripels <-
>>> data.frame(matrix(c(rep(0,nc.tripels*nrow)),ncol=nc.tripels))
>>> for (i in 1:(nc-1)) {
>>> if (i+1 <= nc ) {
>>> for (j in (i+1):nc) {
>>> if (j+1 <= nc ) {
>>> for (k in (j+1):nc) {
>>> t <- DF[,i] * DF[,j] * DF[,k]
>>> tripels[[ind]] <- t
>>> ind <- ind+1
>>> }
>>> }
>>> }
>>> }
>>> }
>>> colnames(tripels) <- cnames.new
>>> print("tripels="); print(tripels)
>>
>>> apply(combn(colnames(DF),3), 2, function(x)
>> DF[,x[1]]*DF[,x[2]]*DF[,x[3]])
>> [,1] [,2] [,3] [,4]
>> [1,] 0 0 3120 0
>> [2,] 13800 0 0 0
>> [3,] 0 40392 0 0
>> [4,] 0 0 0 91080
>> [5,] 0 157410 0 0
>>
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list