[R] Sorting vector based on pairs of comparisons
William Dunlap
wdun|@p @end|ng |rom t|bco@com
Thu Mar 14 16:07:11 CET 2019
This is called topological sorting in some circles. The function below
will give you one ordering that is consistent with the contraints but not
all possible orderings. I couldn't find such a function in core R so I
wrote one a while back based on Kahn's algorithm, as described in Wikipedia.
> Smaller <- c("ASD", "DFE", "ASD", "SDR", "EDF", "ASD")
> Larger <- c("SDR", "EDF", "KLM", "KLM", "SDR", "EDF")
> matComp <- cbind(Smaller, Larger)
> sortTopologically(matComp, unique(as.vector(matComp)))
[1] "ASD" "DFE" "EDF" "SDR" "KLM"
Bill Dunlap
TIBCO Software
wdunlap tibco.com
sortTopologically <- function(edgeMatrix, V)
{
# edgeMatrix is 2-column matrix which describes a partial
# ordering of a set of vertices. The first column is the 'from'
vertex,
# the second the 'to' vertex.
# V is the vector of all the vertices in the graph.
#
# Return a vector, L, consisting of the vertices in
# V in an order consistent with the partial ordering
# described by edgeMatrix.
# Throw an error if such an ordering is not possible.
#
# Use Kahn's algorithm (
https://en.wikipedia.org/wiki/Topological_sorting).
#
# Note that disconnected vertices will not be mentioned in edgeMatrix,
# but will be in V.
stopifnot(is.matrix(edgeMatrix),
ncol(edgeMatrix)==2,
!any(is.na(edgeMatrix)),
!any(is.na(V)),
all(as.vector(edgeMatrix) %in% V))
L <- V[0] # match the type of the input
S <- setdiff(V, edgeMatrix[, 2])
V <- setdiff(V, S)
while(length(S) > 0) {
n <- S[1]
# cat("Adding", n, "to L", "\n")
L <- c(L, n)
S <- S[-1]
mRow <- edgeMatrix[,1] == n
edgeMatrix <- edgeMatrix[ !mRow, , drop=FALSE ]
S <- c(S, setdiff(V, edgeMatrix[,2]))
V <- setdiff(V, S)
}
if (nrow(edgeMatrix) > 0) {
stop("There are cycles in the dependency graph")
}
L
}
On Thu, Mar 14, 2019 at 4:30 AM Pedro Conte de Barros <pbarros using ualg.pt>
wrote:
> Dear All,
>
> This should be a quite established algorithm, but I have been searching
> for a couple days already without finding any satisfactory solution.
>
> I have a matrix defining pairs of Smaller-Larger arbitrary character
> values, like below
>
> Smaller <- c("ASD", "DFE", "ASD", "SDR", "EDF", "ASD")
>
> Larger <- c("SDR", "EDF", "KLM", "KLM", "SDR", "EDF"
>
> matComp <- cbind(Smaller, Larger)
>
> so that matComp looks like this
>
> Smaller Larger
> [1,] "ASD" "SDR"
> [2,] "DFE" "EDF"
> [3,] "ASD" "KLM"
> [4,] "SDR" "KLM"
> [5,] "EDF" "SDR"
> [6,] "ASD" "EDF"
>
> This matrix establishes six pairs of "larger than" relationships that
> can be used to sort the unique values in the matrix,
>
> > unique(as.vector(matComp))
> [1] "ASD" "DFE" "SDR" "EDF" "KLM"
>
> Specifically, I would like to get this:
>
> sorted <- c("ASD", "DFE", "EDF", "SDR", "KLM")
>
> or, equally valid (my matrix does not have the full information):
>
> sorted <- c("DFE", "ASD", "EDF", "SDR", "KLM")
>
> Preferably, I would get the different combinations of the unique values
> that satisfy the "larger than" conditions in the matrix...
>
>
> I am sure this is a trivial problem, but I could not find any algorithm
> to solve it.
>
> Any help would be highly appreciated
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list