[R] converting a data.frame into a different table
arun
smartpink111 at yahoo.com
Sat May 31 04:16:59 CEST 2014
Hi,
You may try:
##Assuming the dataset is a matrix
mat <- structure(c("FBgn0037249", "FBgn0036389", "FBgn0014002", "FBgn0034201",
"FBgn0029860", "FBgn0028526", "FBgn0003486", "FBpp0312226", "FBpp0312225",
"FBpp0312224", "FBpp0312223", "FBpp0312222", "FBpp0312221", "FBpp0312220",
"FBtr0346646", "FBtr0346645", "FBtr0346644", "FBtr0346643", "FBtr0346642",
"FBtr0346641", "FBtr0346640", "FBgn0266186", "FBgn0037894", "FBgn0025712",
"FBgn0025712", "FBgn0261597", "FBgn0263050", "FBgn0263051", "FBpp0312219",
"FBpp0312218", "FBpp0312183", "FBpp0312182", "FBpp0312181", "FBpp0312180",
"FBpp0312179", "FBtr0346639", "FBtr0346638", "FBtr0346593", "FBtr0346592",
"FBtr0346591", "FBtr0346589", "FBtr0346588", "FBgn0010100", "FBgn0026577",
"FBpp0312178", "FBpp0312177", "FBtr0346587", "FBtr0346586", "FBpp0312219"
), .Dim = c(7L, 7L), .Dimnames = list(c("1", "2", "3", "4", "5",
"6", "7"), c("1", "2", "3", "4", "5", "6", "7")))
res <- data.frame(start=rownames(mat)[col(mat)], start.1=colnames(mat)[row(mat)], Value= c(t(mat)))
##Comparing the speed with other methods:
###For easy comparison across methods, converted the columns to factors
fun1 <- function(mat) {
start <- rownames(mat)[col(mat)]
start.1 <- paste0("X", colnames(mat)[row(mat)])
Value <- c(t(mat))
data.frame(start = factor(start, levels = unique(start)), start.1 = factor(start.1,
levels = unique(start.1)), Value)
}
fun2 <- function(mat) {
colnames(mat) <- paste0("X", colnames(mat))
my.df <- setNames(as.data.frame.table(mat), c("start", "start.1", "Value"))
my.df <- my.df[with(my.df, order(start, start.1)), ]
row.names(my.df) <- 1:nrow(my.df)
my.df
}
library(reshape2)
fun3 <- function(mat) {
colnames(mat) <- paste0("X", colnames(mat))
my.df <- transform(setNames(melt(mat), c("start", "start.1", "Value")), start = as.factor(start))
my.df <- my.df[with(my.df, order(start, start.1)), ]
row.names(my.df) <- 1:nrow(my.df)
my.df
}
set.seed(481)
mat1 <- matrix(sample(mat, 4.5e3*4.5e3, replace=TRUE), ncol=4.5e3, dimnames=list(1:4.5e3, 1:4.5e3))
#system.time(res1 <- fun1(mat1))
# user system elapsed
# 7.914 0.836 8.750
system.time(res2 <- fun2(mat1))
# user system elapsed
# 28.257 1.336 29.578
system.time(res3 <- fun3(mat1))
# user system elapsed
# 27.213 1.027 28.224
identical(res1,res2)
#[1] TRUE
identical(res1,res3)
#[1] TRUE
A.K.
On Friday, May 30, 2014 6:10 PM, Assa Yeroslaviz <frymor at gmail.com> wrote:
Hi,
I have a matrix of 4.5Kx4.5K elements with column- and row names
I need to convert this matrix into a table, where one column is the name of
the row for the element, the second column is the name of the column for
the same element and the third column is the element itself.
The way I do it at the moment is with a double for-loop.
With this way though it takes ages for the loop to finish.
I was wondering whether there is a faster way of doing the same conversion.
This is how I am doing it now:
my.df <-data.frame()
for (i in 1:(nrow(out5.df)-1)){
for (j in i:ncol(out5.df)) {
# print(paste(" I am at position: row-", i, " and col-", j, sep=""))
a<- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j],
Value=out5.df[i,j])
my.df <- rbind(my.df, a)
}
}
this is an example for the data I have:
1 2 3 4 5 6 7
1 FBgn0037249 FBpp0312226 FBtr0346646 FBgn0266186
FBpp0312219 FBtr0346639 FBgn0010100
2 FBgn0036389 FBpp0312225 FBtr0346645 FBgn0037894
FBpp0312218 FBtr0346638 FBgn0026577
3 FBgn0014002 FBpp0312224 FBtr0346644 FBgn0025712
FBpp0312183 FBtr0346593 FBpp0312178
4 FBgn0034201 FBpp0312223 FBtr0346643 FBgn0025712
FBpp0312182 FBtr0346592 FBpp0312177
5 FBgn0029860 FBpp0312222 FBtr0346642 FBgn0261597
FBpp0312181 FBtr0346591 FBtr0346587
6 FBgn0028526 FBpp0312221 FBtr0346641 FBgn0263050
FBpp0312180 FBtr0346589 FBtr0346586
7 FBgn0003486 FBpp0312220 FBtr0346640 FBgn0263051
FBpp0312179 FBtr0346588 FBpp0312219
What I would like to get at the end is something like that:
> my.df
start start.1 Value
1 1 X1 FBgn0037249
2 1 X2 FBpp0312226
3 1 X3 FBtr0346646
4 1 X4 FBgn0266186
5 1 X5 FBpp0312219
6 1 X6 FBtr0346639
7 1 X7 FBgn0010100
8 2 X2 FBpp0312225
9 2 X3 FBtr0346645
10 2 X4 FBgn0037894
11 2 X5 FBpp0312218
12 2 X6 FBtr0346638
13 2 X7 FBgn0026577
14 3 X3 FBtr0346644
15 3 X4 FBgn0025712
16 3 X5 FBpp0312183
17 3 X6 FBtr0346593
18 3 X7 FBpp0312178
19 4 X4 FBgn0025712
20 4 X5 FBpp0312182
21 4 X6 FBtr0346592
22 4 X7 FBpp0312177
23 5 X5 FBpp0312181
24 5 X6 FBtr0346591
25 5 X7 FBtr0346587
26 6 X6 FBtr0346589
27 6 X7 FBtr0346586
Sp I would like to know if there is a better way of ding it than a double
for loop.
thanks
Assa
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list