[R] converting a data.frame into a different table

arun smartpink111 at yahoo.com
Sat May 31 04:16:59 CEST 2014


Hi,

You may try:
##Assuming the dataset is a matrix

mat <- structure(c("FBgn0037249", "FBgn0036389", "FBgn0014002", "FBgn0034201", 
"FBgn0029860", "FBgn0028526", "FBgn0003486", "FBpp0312226", "FBpp0312225", 
"FBpp0312224", "FBpp0312223", "FBpp0312222", "FBpp0312221", "FBpp0312220", 
"FBtr0346646", "FBtr0346645", "FBtr0346644", "FBtr0346643", "FBtr0346642", 
"FBtr0346641", "FBtr0346640", "FBgn0266186", "FBgn0037894", "FBgn0025712", 
"FBgn0025712", "FBgn0261597", "FBgn0263050", "FBgn0263051", "FBpp0312219", 
"FBpp0312218", "FBpp0312183", "FBpp0312182", "FBpp0312181", "FBpp0312180", 
"FBpp0312179", "FBtr0346639", "FBtr0346638", "FBtr0346593", "FBtr0346592", 
"FBtr0346591", "FBtr0346589", "FBtr0346588", "FBgn0010100", "FBgn0026577", 
"FBpp0312178", "FBpp0312177", "FBtr0346587", "FBtr0346586", "FBpp0312219"
), .Dim = c(7L, 7L), .Dimnames = list(c("1", "2", "3", "4", "5", 
"6", "7"), c("1", "2", "3", "4", "5", "6", "7")))


res <-  data.frame(start=rownames(mat)[col(mat)], start.1=colnames(mat)[row(mat)], Value= c(t(mat)))



##Comparing the speed with other methods:
###For easy comparison across methods, converted the columns to factors
fun1 <- function(mat) {
    start <- rownames(mat)[col(mat)]
    start.1 <- paste0("X", colnames(mat)[row(mat)])
    Value <- c(t(mat))
    data.frame(start = factor(start, levels = unique(start)), start.1 = factor(start.1, 
        levels = unique(start.1)), Value)
}


fun2 <- function(mat) {
    colnames(mat) <- paste0("X", colnames(mat))
    my.df <- setNames(as.data.frame.table(mat), c("start", "start.1", "Value"))
    my.df <- my.df[with(my.df, order(start, start.1)), ]
    row.names(my.df) <- 1:nrow(my.df)
    my.df
}

library(reshape2)

fun3 <- function(mat) {
    colnames(mat) <- paste0("X", colnames(mat))
    my.df <- transform(setNames(melt(mat), c("start", "start.1", "Value")), start = as.factor(start))
    my.df <- my.df[with(my.df, order(start, start.1)), ]
    row.names(my.df) <- 1:nrow(my.df)
    my.df
}

set.seed(481)
mat1 <- matrix(sample(mat, 4.5e3*4.5e3, replace=TRUE), ncol=4.5e3, dimnames=list(1:4.5e3, 1:4.5e3))
#system.time(res1 <- fun1(mat1))
#   user  system elapsed 
#  7.914   0.836   8.750 
 system.time(res2 <- fun2(mat1))
#   user  system elapsed 
# 28.257   1.336  29.578 
system.time(res3 <- fun3(mat1))
#   user  system elapsed 
# 27.213   1.027  28.224 
 
 identical(res1,res2)
#[1] TRUE
 identical(res1,res3)
#[1] TRUE
A.K.




On Friday, May 30, 2014 6:10 PM, Assa Yeroslaviz <frymor at gmail.com> wrote:
Hi,

I have a matrix of 4.5Kx4.5K elements with column- and row names

I need to convert this matrix into a table, where one column is the name of
the row for the element, the second column is the name of the column for
the same element and the third column is the element itself.

The way I do it at the moment is with a double for-loop.
With this way though it takes ages for the loop to finish.

I was wondering whether there is a faster way of doing the same conversion.

This is how I am doing it now:
my.df <-data.frame()
for (i in 1:(nrow(out5.df)-1)){
    for (j in i:ncol(out5.df)) {
#        print(paste(" I am at position: row-", i, " and col-", j, sep=""))
        a<- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j],
Value=out5.df[i,j])
        my.df <- rbind(my.df, a)
        }
    }

this is an example for the data I have:
    1    2    3    4    5    6    7
1    FBgn0037249    FBpp0312226    FBtr0346646    FBgn0266186
FBpp0312219    FBtr0346639    FBgn0010100
2    FBgn0036389    FBpp0312225    FBtr0346645    FBgn0037894
FBpp0312218    FBtr0346638    FBgn0026577
3    FBgn0014002    FBpp0312224    FBtr0346644    FBgn0025712
FBpp0312183    FBtr0346593    FBpp0312178
4    FBgn0034201    FBpp0312223    FBtr0346643    FBgn0025712
FBpp0312182    FBtr0346592    FBpp0312177
5    FBgn0029860    FBpp0312222    FBtr0346642    FBgn0261597
FBpp0312181    FBtr0346591    FBtr0346587
6    FBgn0028526    FBpp0312221    FBtr0346641    FBgn0263050
FBpp0312180    FBtr0346589    FBtr0346586
7    FBgn0003486    FBpp0312220    FBtr0346640    FBgn0263051
FBpp0312179    FBtr0346588    FBpp0312219

What I would like to get at the end is something like that:
> my.df
   start start.1       Value
1      1      X1 FBgn0037249
2      1      X2 FBpp0312226
3      1      X3 FBtr0346646
4      1      X4 FBgn0266186
5      1      X5 FBpp0312219
6      1      X6 FBtr0346639
7      1      X7 FBgn0010100
8      2      X2 FBpp0312225
9      2      X3 FBtr0346645
10     2      X4 FBgn0037894
11     2      X5 FBpp0312218
12     2      X6 FBtr0346638
13     2      X7 FBgn0026577
14     3      X3 FBtr0346644
15     3      X4 FBgn0025712
16     3      X5 FBpp0312183
17     3      X6 FBtr0346593
18     3      X7 FBpp0312178
19     4      X4 FBgn0025712
20     4      X5 FBpp0312182
21     4      X6 FBtr0346592
22     4      X7 FBpp0312177
23     5      X5 FBpp0312181
24     5      X6 FBtr0346591
25     5      X7 FBtr0346587
26     6      X6 FBtr0346589
27     6      X7 FBtr0346586


Sp I would like to know if there is a better way of ding it than a double
for loop.

thanks
Assa

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list