[R] Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R
David L Carlson
dc@rl@on @ending from t@mu@edu
Mon Dec 17 21:55:40 CET 2018
There are some problems with your example. Your code does not produce anything like your example data frame because you draw only 9 values without replacement. Your code produces 10 columns, each with the same permutation of the values 1:9.
Then your desired output does not make sense in terms of your example data. The first entry is V2:9 but 9 does not appear in row 1.
Using your posted example:
DF <- structure(list(V1 = c(3L, 1L, 2L, 3L, 6L, 8L, 1L, 9L, 1L),
V2 = c(2L, 4L, 3L, 8L, 2L, 2L, 5L, 3L, 2L), V3 = c(5L, 7L, 4L, 3L,
3L, 4L, 3L, 5L, 4L), V4 = c(6L, 8L, 7L, 4L, 7L, 8L, 6L, 8L, 8L),
V5 = c(5L, 7L, 5L, 5L, 2L, 3L, 8L, 4L, 3L), V6 = c(2L, 7L, 8L, 6L,
1L, 2L, 3L, 9L, 2L), V7 = c(6L, 3L, 9L, 7L, 8L, 9L, 8L, 7L, 1L),
V8 = c(8L, 4L, 1L, 4L, 3L, 7L, 9L, 8L, 2L), V9 = c(1L, 2L, 3L, 6L,
2L, 6L, 1L, 1L, 5L), V10 = c(3L, 9L, 5L, 5L, 4L, 5L, 3L, 2L, 6L)),
class = "data.frame", row.names = c(NA, -9L))
Your code produces:
V1 V2 V3 V4 V5
1 V8:8 V4:6 V7:6 V3:5 V5:5
2 V10:9 V4:8 V3:7 V5:7 V6:7
3 V7:9 V6:8 V4:7 V5:5 V10:5
4 V2:8 V7:7 V6:6 V9:6 V5:5
5 V7:8 V4:7 V1:6 V10:4 V3:3
6 V7:9 V1:8 V4:8 V8:7 V9:6
7 V8:9 V5:8 V7:8 V4:6 V2:5
8 V1:9 V6:9 V4:8 V8:8 V7:7
9 V4:8 V10:6 V9:5 V3:4 V5:3
Which seems to be what you wanted.
---------------------------------------------
David L. Carlson
Department of Anthropology
Texas A&M University
-----Original Message-----
From: R-help [mailto:r-help-bounces using r-project.org] On Behalf Of Tom Woolman
Sent: Monday, December 17, 2018 11:34 AM
To: r-help using r-project.org
Subject: [R] Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R
I have a data frame each with 10 variables of integer data for various
attributes about each row of data, and I need to know the highest 5
variables related to each of
row in this data frame and output that to a new data frame. In addition to
the 5 highest variable names, I also need to know the corresponding 5
highest variable values for each row.
A simple code example to generate a sample data frame for this is:
set.seed(1)
DF <- matrix(sample(1:9,9),ncol=10,nrow=9)
DF <- as.data.frame.matrix(DF)
This would result in an example data frame like this:
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
# 1 3 2 5 6 5 2 6 8 1 3
# 2 1 4 7 8 7 7 3 4 2 9
# 3 2 3 4 7 5 8 9 1 3 5
# 4 3 8 3 4 5 6 7 4 6 5
# 5 6 2 3 7 2 1 8 3 2 4
# 6 8 2 4 8 3 2 9 7 6 5
# 7 1 5 3 6 8 3 8 9 1 3
# 8 9 3 5 8 4 9 7 8 1 2
# 9 1 2 4 8 3 2 1 2 5 6
My ideal output would be something like this:
# V1 V2 V3 V4 V5
# 1 V2:9 V7:8 V8:7 V4:6 V3:5
# 2 V9:9 V3:8 V5:7 V7:6 V4:5
# 3 V5:9 V3:8 V2:7 V9:6 V7:5
# 4 V8:9 V4:8 V2:7 V5:6 V9:5
# 5 V9:9 V1:8 V6:7 V3:6 V5:5
# 6 V8:9 V1:8 V5:7 V9:6 V4:5
# 7 V2:9 V8:8 V7:7 V5:6 V9:5
# 8 V4:9 V7:8 V9:7 V2:6 V8:5
# 9 V3:9 V7:8 V8:7 V4:6 V5:5
# 10 V6:9 V8:8 V1:7 V9:6 V4:5
I was trying to use code, but this doesn't seem to work:
out <- t(apply(DF, 1, function(x){
o <- head(order(-x), 5)
paste0(names(x[o]), ':', x[o])
}))
as.data.frame(out)
Thanks everyone!
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list