Help for package ConsRankClass

Type:

Package

Title:

Classification and Clustering of Preference Rankings

Version:

1.0.2

Date:

2025-04-15

Maintainer:

Antonio D'Ambrosio <antdambr@unina.it>

Depends:

ConsRank

Imports:

janitor, methods, pracma, rlist, proxy, smacof, gtools

Description:

Tree-based classification and soft-clustering method for preference rankings, with tools for external validation of fuzzy clustering, and Kemeny-equivalent augmented unfolding. It contains the recursive partitioning algorithm for preference rankings, non-parametric tree-based method for a matrix of preference rankings as a response variable. It contains also the distribution-free soft clustering method for preference rankings, namely the K-median cluster component analysis (CCA). The package depends on the 'ConsRank' R package. Options for validate the tree-based method are both test-set procedure and V-fold cross validation. The package contains the routines to compute the adjusted concordance index (a fuzzy version of the adjusted rand index) and the normalized degree of concordance (the corresponding fuzzy version of the rand index). The package also contains routines to perform the Kemeny-equivalent augmented unfolding. The mds endine is the function 'sacofSym' from the package 'smacof'. Essential references: D'Ambrosio, A., Vera, J.F., and Heiser, W.J. (2021) <doi:10.1080/00273171.2021.1899892>; D'Ambrosio, A., Amodio, S., Iorio, C., Pandolfo, G., and Siciliano, R. (2021) <doi:10.1007/s00357-020-09367-0>; D'Ambrosio, A., and Heiser, W.J. (2019) <doi:10.1007/s41237-018-0069-5>; D'Ambrosio, A., and Heiser W.J. (2016) <doi:10.1007/s11336-016-9505-1>; Hullermeier, E., Rifqi, M., Henzgen, S., and Senge, R. (2012) <doi:10.1109/TFUZZ.2011.2179303>; Marden, J.J. <ISBN:0412995212>.

License:

GPL-3

Encoding:

UTF-8

URL:

https://www.r-project.org/

Repository:

CRAN

RoxygenNote:

7.3.2

NeedsCompilation:

Packaged:

2025-04-15 15:23:55 UTC; Antonio

Author:

Antonio D'Ambrosio [aut, cre]

Date/Publication:

2025-04-17 07:40:09 UTC

European Values Studies (EVS) data

Description

Random sub-sample of 3584 cases of the survey conducted in 1999 in 32 countries analyzed by Vermunt (2003).

Usage

data("EVS")

Format

The format is: List of 3

$ data:'data.frame': 1911 obs. of 11 variables:

country, gender ,yearbird, mstatus (marital status), eduage (age of education completion), employment (Employment status: ordinal scale 1-8), householdinc (Household income: ordinal scale 1-10), A (Maintain order in Nation),Give people more say in Government decisions, (C) Fight rising prices, (D) Protect freedom of speech.

$ predictors:'data.frame' with all the predictors

$ rankings : matrix with the preferencres for "A" (Maintain order in Nation), "B" (Give people more say in Government decisions), "C" (Fight rising prices), "D" (Protect freedom of speech).

Details

Rankings were obtained by applying the post-materialism scale developed by Inglehart (1977). The scale is based upon an experiment of the type “pick 2 out of 4” most important political goals for your Governments. For this reason, replace the 'NA's with 3 before using the rankings with codes 'ranktree' or 'cca' (see D'Ambrosio and Heiser, 2016). About the predictors, the coding of the Countries are: G1 (Austria, Denmark, Netherlands, Sweden), G2 (Belgium, Croatia, France, Greece, Ireland, Northern Ireland, Spain), G3 (Bulgaria, Czechnia, East, Germany, Finland, Iceland, Luxembourg, Malta, Portugal, Romania, Slovenia, West Germany), G4 (Belarus, Estonia, Hungary, Latvia, Lithuania, Poland, Russia, Slovakia, Ukraine). Coding of predictor "mstatus" are: mar (married), wid (widowed), div (divorced), sep (separated), nevm (never married).

Source

http://statisticalinnovations.com/technicalsupport/choice_datasets.html

References

Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology, 33(1), 213–239.

Inglehart, R. (1977). The silent revolution: Changing values and political styles among Western Publics. Princeton, NJ: Princeton University Press.

D'Ambrosio, A., and Heiser W.J. (2016). A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika, vol. 81 (3), pp.774-94.

Examples

data(EVS)

# EVS$rankings[is.na(EVS$rankings)] <- 3 #place unranked objects in a tie to the third position
# ccares <- cca(EVS$rankings,4) #solution with 4 components

Irish Election data set

Description

An opinion poll conducted by Irish Marketing Surveys one month prior to the election in 1997. Interviews were conducted on about 1100 respondents, drawn from 100 sampling areas. Interviews took place at randomly located homes, with respondents selected according to a socioeconomic quota. A range of sociological questions was asked of each respondent, as was their voting preference, if any, for each of the candidates.

Usage

data("Irish")

Format

The format is: List of 3

$ IrishElection: 'data.frame': 1083 obs. of 11 variables: Gender (male, housewife, nonhousewife), marital status (single, married, separated), age, socialclass (five unordered categories), Area (rural, city, town), government satisfaction (no opinion,m satisfied, dissatisfied), Bano , Roch, McAl, Nall, Scal

$ predictors :'data.frame' with all the predictors

$ rankings : matrix with the preferencres for "Bano" "Roch" "McAl" "Nall"

Details

In the original version of the data, the ranking matrix contains NAs. Here, NAs are replaced with the number 7, to indicate that all the non-stated preferences are in a tie at the last position (see D'Ambrosio and Heiser, 2016). For details about the data set see Gormley and Murphy, 2008.

Source

https://projecteuclid.org/journals/annals-of-applied-statistics/volume-2/issue-4/A-mixture-of-experts-model-for-rank-data-with/10.1214/08-AOAS178.full?tab=ArticleLinkSupplemental

References

Gormley, I.C., and Murphy, T.B. (2008). A mixture of experts model for rank data withapplications in election studies. Annals of Applied Statistics 2(4): 1452-1477. DOI: 10.1214/08-AOAS178

D'Ambrosio, A., and Heiser W.J. (2016). A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika, vol. 81 (3), pp.774-94. DOI: 10.1007/s11336-016-9505-1.

Examples

data(Irish)

University rankings dataset.

Description

University rankings dataset was analysed by Dittrich, Hatzinger and Katzenbeisser (1998) to investigate paired comparison data concerning European universities and student's characteristics with the goal to show that university rankings are different for different groups of students. Here both raw data (with paired comparisons) and the version with rankings are preesented (see details). A survey of 303 students studying at the Vienna University of Economics was carried out to examine the student's preference of six universities, namely London, Paris, Milan, St. Gallen, Barcelona and Stockholm. The data set contains 23 variables. The first 15 digits in each row indicate the preferences of a student. For a given comparison, responses were coded by 1 if the first preference was preferred, by 2 if the second university was preferred, and by 3 if universities are tied. All rows containing missing ranked Universities were skipped.

Usage

data("Univranks")

Format

The format is: List of 3

$ rawdata: 'data.frame': 212 obs. of 23 variables: the first 15 are the paired comparisons coded as follows: (1: the first is preferred to the second; 2: the second is preferred to the fisrt; 3 tied)

$ LP : comparison of London to Paris

$ LM : comparison of London to Milan

$ PM : comparison of London to Milan

$ LSg : comparison of London to St. Gallen

$ PSg : comparison of Paris to St. Gallen

$ MSg : comparison of Milan to St. Gallen

$ LB : comparison of London to Barcelona

$ PB : comparison of Paris to Barcelona

$ MB : comparison of Milan to Barcelona

$ SgB : comparison of St. Gallen to Barcelona

$ LSt : comparison of London to Stockholm

$ PSt : comparison of Paris to Stockholm

$ MSt : comparison of Milan to Stockholm

$ SgSt: comparison of St. Gallen to Stockholm

$ BSt : comparison of Barcelona to Stockholm

$ Stud: Factor w/ 2 levels "commerce","other"

$ Eng : Factor w/ 2 levels "good","poor""

$ Fra : Factor w/ 2 levels "good","poor"

$ Spa : Factor w/ 2 levels "good","poor"

$ Ita : Factor w/ 2 levels "good","poor"

$ Wor : Factor w/ 2 levels "no","yes"

$ Deg : Factor w/ 2 levels "no","yes"

$ Sex : Factor w/ 2 levels "female","male"

$ predictors:'data.frame': 212 obs. of 8 variables( the last 8 variables of the "rawdata" dataframe

$ rankings : matrix of preference rankings. The columns are: "L" (London), "P" (Paris), "M" (Milan), "Sg" (St. Gallen), "B" (Barcerlona), "St" (Stockholm)

Details

To obtain the preference rankings from the paired comparisons the procedure has been the following: the first row of the raw data is [1 3 2 1 2 1 1 2 1 1 1 2 1 1 2]. London is preferred to Paris, St. Gallen, Barcelona, Stockholm (LP, LM, LSg, LB and LSt are always equal to 1), and there is no preference between London and Milan (they are tied); Milan is preferred to Paris (PM = 2), St. Gallen, Barcelona and Stockholm; and so on. The first ordering is then <{L M} Sg St B P> corresponding to a ranking [1,5,1,2,4,3], where the columns indicate L P M Sg B St.

Source

http://www.blackwellpublishers.co.uk/rss

References

Dittrich, R., Hatzinger, R., and Katzenbeisser, W. (1998). Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47(4), 511-525. DOI: 10.1111/1467-9876.00125

D'Ambrosio, A. (2008). Tree based methods for data editing and preference rankings. Ph.D. thesis, University of Naples Federico II. https://www.doi.org/10.6092/UNINA/FEDOA/2746

Examples

data(Univranks)

Kemeny-equivalent augmented dissimilarity matrix

Description

Kemeny-equivalent augmented dissimilarity matrix

Usage

augmatrix(X)

Arguments

X

A n by m data matrix, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns.

Details

First the matrix is transformed with the tau_X rank correlation coeficient, then it is normalized. The output contains:

Delta		the augmented dissimilarity matrix
Interaction		the submatrix containnig the interactions individuals-items
Objects		the submatrix containing the within-items proximities
Indiv		the submatrix containing the within-individuals proximities
beta		the beta parameter
alpha		the alpha parameter
csi		the csi parameter
res		the resume of th eaugmentation in terms of:
	TauX	tau_x rank correlation coefficient
	Kendall	kendall rank correlation coefficient
	Spearman	Spearman correlation coefficient

Value

A list containing the dissimilarity matrix and othe information about the augmented matrix. See details for detailed information.

Author(s)

Antonio D'Ambrosio antdambr@unina.it

References

D'Ambrosio, A., Vera, J. F., & Heiser, W. J. (2022). Avoiding degeneracies in ordinal unfolding using Kemeny-equivalent dissimilarities for two-way two-mode preference rank data. Multivariate Behavioral Research, 57(4), 679-699.

K-Median Cluster Component Analysis

Description

K-Median Cluster Component Analysis, a distribution-free soft-clustering method for preference rankings.

Usage

cca(X, k, control = ccacontrol(...), ...)

Arguments

X

A n by m data matrix containing preference rankings, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns.

k

The number of cluster components

control

a list of options that control details of the cca algorithm governed by the function ccacontrol. The options govern maximum number of iterations of cca (itercca=1 is the default), the algorithm chosen to compute the median ranking (default, "quick"), and other options related to the consrank algorithm, which is called by cca

...

arguments passed bypassing ccacontrol

Details

The user can use any algorithm implemented in the consrank function from the ConsRank package. All algorithms allow the user to set the option 'full=TRUE' if the median ranking(s) must be searched in the restricted space of permutations instead of in the unconstrained universe of rankings of n items including all possible ties. There are two classification uncertainty measures: Us and Uprods. "Us" is the geometric mean of the membership probabilities of each individual, normalized in such a way that in the case of maximum uncertainty Us=1. "Ucca" is the average of all the "Us". "Uprods" is the product of the membership probabilities of each individual, normalized in such a way that in the case of maximum uncertainty Uprods=1. "Uprodscca" is the average of all the "Uprods".

Value

An object of the class "cca". It contains:

pk		the membership probability matrix
clc		cluster centers
oclc		cluster centers in terms of orderings
idc		crisp partition: id of the cluster component associated with the highest membership probability
Hcca		Global homogeneity measure (tau_X rank correlation coefficient)
hk		Homogeneity within cluster
props		estimated proportion of cases within cluster
Us		Uncertainty measure per-individual (see details)
Ucca		Global uncertainty measure
Uprods		Uncertainty measure per-individual (see details)
Uprodscca		Global uncertainty measure
consrankout		complete output of rank aggregation algorithm, containing eventually multiple median rankings

Author(s)

Antonio D'Ambrosio antdambr@unina.it

References

D'Ambrosio, A. and Heiser, W.J. (2019). A Distribution-free Soft Clustering Method for Preference Rankings. Behaviormetrika , vol. 46(2), pp. 333–351, DOI: 10.1007/s41237-018-0069-5

Heiser W.J., and D'Ambrosio A. (2013). Clustering and Prediction of Rankings within a Kemeny Distance Framework. In Berthold, L., Van den Poel, D, Ultsch, A. (eds). Algorithms from and for Nature and Life.pp-19-31. Springer international. DOI: 10.1007/978-3-319-00035-0_2.

Ben-Israel, A., and Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), pp.5-26. DOI: 10.1007/s00357-008-9002-z

Examples


data(Irish)
set.seed(135) #for reproducibility
# CCA with four components
ccares <- cca(Irish$rankings, 4, itercca=10)
summary(ccares)

Utility function

Description

Utility function to use to set the control arguments of cca

Usage

ccacontrol(
  algorithm = "quick",
  full = FALSE,
  itercca = 1,
  consrankitermax = 10,
  np = 15,
  gl = 100,
  ff = 0.4,
  cr = 0.9,
  proc = FALSE,
  ps = FALSE
)

Arguments

algorithm

The algorithm used to compute the median ranking. One among"BB", "quick" (default), "fast" and "decor"

full

Specifies if the median ranking must be searched in the universe of rankings including all the possible ties. Default: FALSE

itercca

Number of iterations of cca

consrankitermax

Number of iterations for "fast" and "decor" algorithms. itermax=10 is the default option.

np

(for "decor" only) the number of population individuals. np=15 is the default option.

gl

(for"decor" only) generations limit, maximum number of consecutive generations without improvement. gl=100 is the default option.

ff

(for"decor" only) the scaling rate for mutation. Must be in [0,1]. ff=0.4 is the default option.

cr

(for"decor" only) the crossover range. Must be in [0,1]. cr=0.9 is the default option.

proc

(for "BB" only) proc=TRUE allows the branch and bound algorithm to work in difficult cases, i.e. when the number of objects is larger than 15 or 25. proc=FALSE is the default option

ps

If PS=TRUE, on the screen some information about how many branches are processed are displayed. Default value: FALSE

Value

A list containing all the control parameters

Author(s)

Antonio D'Ambrosio antdambr@unina.it

Normalized Degree of Concordance (NDC) and Adjusted Concordance Index (ACI)

Description

Given two fuzzy (Ruspini) partitions, it compute the NDC and the ACI. NDC is the fuzzy version of the Rand Index, as well as ACI is the fuzzy version of the Adjusted Rand Index

Usage

fuzzyconcordance(P, Q, nperms = 1000)

Arguments

P

A fuzzy partition. It has to be a matrix with n rows and k columns. Each column is expression of the degree of membership of the i-th row over the k partitions (see details).

Q

A fuzzy partition. It has to be a matrix with n rows and h columns. Each column is expression of the degree of membership of the i-th row over the h partitions (see details).

nperms

number of permutations necessary to compute ACI. Default: 1000

Details

Both P and Q, or only one of those, can be crisp (or hard) partitions. In this case, each row must contain either 0 or 1, and the sum of the i-th row must be 1. In other words, either P or Q (or both) are expressed in terms of dummy coding. If both partitions are crisp, then NDC is equal to Rand Index and ACI is equal to Adjusted Rand Index. This function can be used to externally validate the output of any fuzzy clustering method

Value

A list containing:

ACI		the Adjusted Concordance Index
NDC		the Normalized Degree of Concordance

Author(s)

Antonio D'Ambrosio antdambr@unina.it

References

D’Ambrosio, A., Amodio, S., Iorio, C., Pandolfo, G. and Siciliano, R. (2021). Adjusted Concordance Index: an Extension of the Adjusted Rand Index to Fuzzy Partitions. Journal of Classification vol. 38(1), pp. 112–128 (2021). DOI: 10.1007/s00357-020-09367-0

Hullermeier, E., Rifqi, M., Henzgen, S., and Senge, R. (2012). Comparing fuzzy partitions: a generalization of the Rand index and related measures. IEEE Transactions on Fuzzy Systems, 20(3), 546–556. DOI: 10.1109/TFUZZ.2011.2179303

Examples

#two random fuzzy partitions
P = rbind(c(0.5259,  0.1656,    0.3085),
c(0.5623,    0.1036,    0.3341),
c(0.2508,    0.1849,    0.5643),
c(0.5654,    0.1934,    0.2413),
c(0.4529,    0.1679,    0.3792),
c(0.2390,    0.1758,    0.5852),
c(0.3114,    0.1743,    0.5143),
c(0.4188,    0.1392,    0.4420),
c(0.5830,    0.1655,    0.2514),
c(0.5860,    0.1171,    0.2969),
c(0.2630,    0.1706,    0.5664),
c(0.5882,    0.1032,    0.3086),
c(0.5829,    0.1277,    0.2894),
c(0.3942,    0.1046,    0.5012),
c(0.5201,    0.1097,    0.3702),
c(0.2568,    0.1823,    0.5609),
c(0.3687,    0.1695,    0.4618),
c(0.5663,    0.1317,    0.3020),
c(0.5169,    0.1950,    0.2881),
c(0.5838,    0.1034,    0.3128))

Q = rbind(c(0.4494,    0.3755,    0.1751),
c(0.5219,    0.3526,    0.1255),
c(0.3432,    0.5062,    0.1506),
c(0.3120,    0.5181,    0.1699),
c(0.5362,    0.2747,    0.1891),
c(0.4082,    0.3959,    0.1959),
c(0.4670,    0.3782,    0.1547),
c(0.4276,    0.4585,    0.1139),
c(0.4013,    0.4837,    0.1149),
c(0.3724,    0.5019,    0.1258),
c(0.5055,    0.3104,    0.1841),
c(0.4027,    0.4719,    0.1254),
c(0.3565,    0.4620,    0.1814),
c(0.6106,    0.2650,    0.1244),
c(0.5595,    0.2476,    0.1929),
c(0.4657,    0.3993,    0.1350),
c(0.2964,    0.5839,    0.1197),
c(0.5387,    0.3362,    0.1251),
c(0.4043,    0.4341,    0.1616),
c(0.5631,    0.2895,    0.1473))
         
ci <- fuzzyconcordance(P,Q) 

#generate a random fuzzy partition with two components (clusters)
Q2 <- matrix(runif(20),ncol=1)
Q2 <- cbind(Q2,1-Q2)

ci2 <- fuzzyconcordance(P,Q2)

#generate a random crisp partition
P2 <- t(rmultinom(20,1,c(0.3,0.3,0.4)))

ci3 <- fuzzyconcordance(P2,Q)
#--------------------
## Not run: 
# install.packages("Rankcluster")
library("Rankcluster") # model-based clustering algorithm for
                     #  ranking data by Biernacki and Jacques (2013)
                     #  <doi:10.1016/j.csda.2012.08.008>
data(APA)
set.seed(136) #for reproducibility
rcres <- rankclust(APA$data,K=3)  # solution with 3 centers, it takes about 75 seconds
##
ccares <- cca(APA$data,k=3) #solution with 3 components, it takes about 7 seconds
##
ci <- fuzzyconcordance(rcres[3]@tik,ccares$pk)
ci$ACI  # 0.0226 means that the two partitions are similar (see NDC below), 
        # but their similarity is mainly due to chance
ci$NDC

## End(Not run)

Determine a tree from the main tree-based structure

Description

Given a tree belonging to the class "ranktree", determine a subtree with a given number of terminal nodes

Usage

getsubtree(Tree, cut, tokeep = NULL)

Arguments

Tree

An object of the class "ranktree" coming form te function ranktree

cut

The maximum number of terminal nodes that the Tree must have

tokeep

parameter invoked by other internal functions

Details

If the pruning sequence returns a series of subtrees with, say, 1,2,4,7,9 terminal nodes and the user set cut=8, the function extract the subtree with 7 terminal nodes.

Value

An object of the class "ranktree", containing the same information of the output of the function ranktree

Author(s)

Antonio D'Ambrosio antdambr@unina.it

Examples


data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
#see how many terminal nodes have the trees compomimg the nested sequence of subtrees
infoprun <- tree$pruneinfo$termnodes
#select the tree with, say, 6 terminal nodes
tree6 <- getsubtree(tree,6)

Kemeny-equivalent augmented unfolding

Description

Kemeny-equivalent augmented unfolding.

Usage

kunfolding(X, p = 2, control = mdscontrol(...), ...)

Arguments

X

A n by m data matrix, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns.

p

the dimensionality of the solution. Default p=2

control

a list of options that control details of the kunfolding model governed by the function mdscontrol. The options govern the MDS model ("ordinal" or "metric"), the initial configuration ("torgerson", "random" or "user"), the transformation ("primary", "secondary", "tertiary","spline","ratio", "interval","none"), the set of weights, and other related options

...

arguments passed bypassing mdscontrol

Details

The MDS engine is smacofsym from smacof.In a future release other mds algorithms will be implemented. The output consists in a object of the class "kunfolding". It contains:

rawstress			raw stress
nrawstress			normalized raw stress
stress1			Stress-1
rowcoord			row (individuals) coordinates
colcord			column (items) coordinates
dhat			dhat
dij			configuration distance
shepardD			DeSarbo I Index
kendallfit			Kendall tau_b between transformed and fitted proximities
tauxfit			Tau_X between transformed and fitted proximities
avgrecov			Averaged recovery measure between raw preference data and fitted proximities
avgedpearson			Averaged Pearson correlation between raw preference data and fitted proximities
avgspearman			Averaged Spearman rho between raw preference data and fitted proximities
avgkendall			Averaged Kendall taub between raw preference data and fitted proximities
avgtaux			Averaged Tau_X between raw preference data and fitted proximities
resume			Resume meausures
resumerec	tab	Resume of recovery measures
resumeaug			Resume of augmentation matrix
kDelta			Kemeny equivalent dissimilarity matrix
beta			beta parameter
alpha			alpha parameter
interactions			n x m interaction submatrix
csi			csi parameter
mdssol			mds solution as returned by `smacof` package
n_i			number of individuals
n_c			number of items
tots			total
model			mds model
transf			transformation used

Value

An object of the class kunfolding. See details for detailed information.

Author(s)

Antonio D'Ambrosio antdambr@unina.it

References

Examples

data("breakfast", package="smacof")
unfout <- kunfolding(breakfast)
itemsl <- colnames(breakfast)
plot(unfout,labs=itemsl)

Utility function

Description

A utility function completing the output of the function ranktree.

Usage

layouttree(Tree)

Arguments

Tree

an object of the class "ranktree"

Value

an object of the class "ranktree" completing the output of the function ranktree

Author(s)

Antonio D'Ambrosio antdambr@unina.it

Utility function

Description

Utility function to use to set the control arguments of kunfolding

Usage

mdscontrol(
  model = "ordinal",
  init = "torgerson",
  transf = "primary",
  userinit = NULL,
  w = NULL,
  minstress = 1e-05,
  itermax = 500,
  printscr = TRUE,
  spline.degree = 2,
  spline.intKnots = 2,
  relax = FALSE,
  modulus = 1
)

Arguments

model

Specifies MDS model. One among "ordnal (default)" or "metric"

init

Initial configuration. One among "torgerson" (degault), "random" or "user"

transf

The transformation. One among "primary" (default), "secondary","tertiary","spline","ratio","interval","none"

userinit

The user initial configuration if "init" has been set as "user"

w

The set of weigths. Default: NULL

minstress

the minimum stress (for stress method). Default 1e-5

itermax

Maximum number if iterations. Default 500

printscr

Display the summary of the model. Default TRUE

spline.degree

Degree of spline transformation. Default 2

spline.intKnots

Interior knots. default 2

relax

Relax the solution. Default FALSE

modulus

Modulus. Default 1

Value

A list containing all the control parameters

Author(s)

Antonio D'Ambrosio antdambr@unina.it

Path of a terminal node

Description

Given an object of the class "ranktree", it visualize the path leading to the terminal node

Usage

nodepath(termnode, Tree)

Arguments

termnode

The terminal node of which the path has to be extracted

Tree

An object of the class "ranktree"

Value

The path leading to the terminal node

Author(s)

Antonio D'Ambrosio antdambr@unina.it

Examples


data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)
#get information about all the paths leading to terminal nodes
paths <- treepaths(tree)
#see the path for terminal node number 8
nodepath(termnode=8,tree)

Plot Kemeny equivalent augmented unfolding solution

Description

Plot the Kemeny equivalent augmented unfolding coming from kunfolding

Usage

## S3 method for class 'kunfolding'
plot(
  x,
  labs = NULL,
  labsrow = NULL,
  main = NULL,
  cols = NULL,
  cexind = 1,
  cexitems = 1,
  pchcol = 15,
  ...
)

Arguments

x

An object of the class "kunfolding"

labs

The labels of the items. Defalut is NULL. If not provided, a sequence "o1,...,on" is printed, with n=number of items

labsrow

The labels of the individuals. Defalut is NULL. If not provided, a sequence "1,...,m" is printed, with m=number of individuals

main

Main title of the plot. Default NULL

cols

Color of the individuals. It must be numeric. Default is NULL (dark gray).

cexind

cex of the individuals. Default 1

cexitems

cex of the items. defaul 1

pchcol

pch parameter for items points. Default 15

...

System reserved (No specific usage)

Value

the plot of unfolding solution

Author(s)

Antonio D'Ambrosio antdambr@unina.it

Examples

data("breakfast", package="smacof")
unfout <- kunfolding(breakfast)
itemsl <- colnames(breakfast)
plot(unfout,labs=itemsl,cexitems=0.8)

Plot tree-based structure or pruning sequence of ranktree

Description

Plot the tree coming from the ranktree or the pruning sequence of the ranktree

Usage

## S3 method for class 'ranktree'
plot(
  x,
  plot.type = "tree",
  dispclass = FALSE,
  valtree = NULL,
  taos = TRUE,
  ...
)

Arguments

x

An object of the class "ranktree"

plot.type

One among "tree" or "pruningseq"

dispclass

Display the median ranking above terminal nodes. Default option: FALSE

valtree

If plot.type="pruningseq", it shows the Tau_x rank correlation coefficient or the error along the pruning sequence on the training set. If valtree is the output of the function validatetree, it shows either the Tau_x rank correlation coefficient or the error along the pruning sequence of also the decision tree (validated by wither test set or cross-validation)

taos

If plot.type="pruningseq", it plots the Tau_x rank correlation coefficient along the pruning sequence. If taos=FALSE, it plots the error.

...

System reserved (No specific usage)

Value

the plot of either the tree or the pruning sequence

Author(s)

Antonio D'Ambrosio antdambr@unina.it

Examples

data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
plot(tree,dispclass=TRUE)
  

data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
plot(tree,dispclass=TRUE)
#test set validation
vtreetest <- validatetree(tree,testX=EVS$predictors[-training,],EVS$rankings[-training,]) 
dtree <- getsubtree(tree,vtreetest$best_tau)
plot(dtree,dispclass=TRUE)
#see the global weigthted tau_X rank correlation coefficients
plot(tree,plot.type="pruningseq",valtree=vtreetest)
#see the error rates
plot(tree,plot.type="pruningseq",valtree=vtreetest, taos=FALSE)

Predict the median rankings for new observations

Description

Predict the median rankings in a tree-based structure built with ranktree for new observations

Usage

## S3 method for class 'ranktree'
predict(object, newx, ...)

Arguments

object

An object of the class "ranktree"

newx

A dataframe of the same nature of the predictor dataframe with which the tree has been built

...

System reserved (No specific usage)

Value

A list containing:

rankings		the fit in terms of rankings
orderings		the fit in terms of orderings
info		dataframe containing the terminal nodes in which the new x fall down, then the new x and the fit (in terms of rankings)

Author(s)

Antonio D'Ambrosio antdambr@unina.it

Examples


data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
#use the function predict ro predict rankings for new predictors
rankfit <- predict(tree,newx=EVS$predictors[-training,])
#fit in terms of rankings
rankfit$rankings
#fit in terms of orderings
rankfit$orderings
# information about the fit (terminal node, predictor and fit (in terms of rankings))
rankfit$info

S3 methods for cca

Description

Print methods for objects of class cca

Usage

## S3 method for class 'cca'
print(x, ...)

Arguments

x

An object of the class "cca"

...

not used

Value

print a brief summary of the CCA

S3 methods for kunfolding

Description

Print methods for objects of class kunfolding

Usage

## S3 method for class 'kunfolding'
print(x, ...)

Arguments

x

An object of the class "kunfolding"

...

not used

Value

print a brief summary of the Kemeny equivalent augmented unfolding

S3 methods for ranktree

Description

Print methods for objects of class ranktree

Usage

## S3 method for class 'ranktree'
print(x, ...)

Arguments

x

An object of the class "ranktree"

...

not used

Value

print a brief summary of the prediction tree

Examples


data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
tree

Recursive partitioning method for the prediction of preference rankings based upon Kemeny distances

Description

Recursive partitioning method for the prediction of preference rankings based upon Kemeny distances.

Usage

ranktree(Y, X, prunplot = FALSE, control = ranktreecontrol(...), ...)

Arguments

Y

A n by m data matrix, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns.

X

A dataframe containing the predictor, that must have n rows.

prunplot

prunplot=TRUE returns the plot of the pruning sequence. Default value: FALSE

control

a list of options that control details of the ranktree algorithm governed by the function ranktreecontrol. The options govern the minimum size within node to split (the default value is 0.1*n, where n is the total sample size), the bound on the decrease in impurity, (default, 0.01), the algorithm chosen to compute the median ranking (default, "quick"), and other options related to the consrank algorithm, which is called by ranktree

...

arguments passed bypassing ranktreecontrol

Details

X		the predictors: it must be a dataframe
Y		the response variable: the matrix of the rankings
node		a list containing teh tree-based structure:
	number	node number
	terminal	logical: TRUE is terminal node
	father	father node number of the current node
	idfather	id of the father node of the current node
	size	sample size within node
	impur	impurity at node
	wimpur	weighted impurity at node
	idatnode	id of the observations within node
	class	median ranking within node in terms of orderings
	nclass	median ranking within node in terms of rankings
	mclass	eventual multiple median rankings
	tau	Tau_x rank correlation coefficient at node
	wtau	weighted Tau_x rank correlation coefficient at node
	error	error at node
	werror	weighted error at node
	varsplit	variables generating split
	varsplitid	id of variables generating split
	cutspli	splitting point
	children	children nodes generated by current node
	idchildren	id of children nodes generated by current node
	...	other info about node
control		parameters used to build the tree
numnodes		number of nodes of the tree
tsynt		list containing the synthesis of the tree:
	children	list containing all information about leaves
	parents	list containing all information about parent nodes
geneaoly		data frame containing information about all nodes
idgenealogy		data frame containing information about all nodes in terms of nodes id
idparents		id of the parents of all the nodes
goodness		goodness -and badness- of fit measures of the tree: Tau_X, error, impurity
nomin		information about nature of the predictors
alpha		alpha parameter for pruning sequence
pruneinfo		list containing information about the pruning sequence:
	prunelist	information about the pruning
	tau	tau_X rank correlation coefficient of each subtree
	error	error of each subtree
	termnodes	number of terminal nodes of each subtree
subtrees		list of each subtree created with the cost-complexity pruning procedure

Value

An object of the class ranktree. See details for detailed information.

Author(s)

Antonio D'Ambrosio antdambr@unina.it

References

D'Ambrosio, A., and Heiser W.J. (2016). A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika, vol. 81 (3), pp.774-94.

Examples

data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)


data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)

#plot the tree
plot(tree,dispclass=TRUE)

#visualize information
summary(tree)

#get information about the paths leading to terminal nodes (all the paths)
infopaths <- treepaths(tree)

#the terminal nodes
infopaths$leaves

#sample size within each terminal node
infopaths$size

#visualize the path of the second leave (terminal node number 8)
infopaths$paths[[2]]

#alternatively
nodepath(termnode=8,tree)


set.seed(132) #for reproducibility
#validation of the tree via v-fold cross-validation (default value of V=5)
vtree <- validatetree(tree,method="cv")

#extract the "best" tree
dtree <- getsubtree(tree,vtree$best_tau)

summary(dtree)

#plot the validated tree
plot(dtree,dispclass=TRUE)

#predicted rankings
rankfit <- predict(dtree,newx=Irish$predictors)

#fit of rankings
rankfit$rankings

#fit in terms of orderings
rankfit$orderings

#all info about the fit (id og the leaf, predictor values, and fit)
rankfit$orderings

Utility function

Description

Utility function to use to set the control arguments of ranktree

Usage

ranktreecontrol(
  num = NULL,
  decrmin = 0.01,
  algorithm = "quick",
  full = FALSE,
  itermax = 10,
  np = 15,
  gl = 100,
  ff = 0.4,
  cr = 0.9,
  proc = FALSE,
  ps = FALSE
)

Arguments

num

The maximum number of observations in a node to be split: default, 10% of the sample size

decrmin

Minimum decrease in impurity

algorithm

The algorithm used to compute the median ranking. One among"BB", "quick" (default), "fast" and "decor"

full

Specifies if the median ranking must be searched in the universe of rankings including all the possible ties. Default: FALSE

itermax

Number of iterations for "fast" and "decor" algorithms. itermax=10 is the default option.

np

(for "decor" only) the number of population individuals. np=15 is the default option.

gl

(for"decor" only) generations limit, maximum number of consecutive generations without improvement. gl=100 is the default option.

ff

(for"decor" only) the scaling rate for mutation. Must be in [0,1]. ff=0.4 is the default option.

cr

(for"decor" only) the crossover range. Must be in [0,1]. cr=0.9 is the default option.

proc

(for "BB" only) proc=TRUE allows the branch and bound algorithm to work in difficult cases, i.e. when the number of objects is larger than 15 or 25. proc=FALSE is the default option

ps

If PS=TRUE, on the screen some information about how many branches are processed are displayed. Default value: FALSE

Value

A list containing all the control parameters

Author(s)

Antonio D'Ambrosio antdambr@unina.it

S3 methods for ranktree

Description

Summary methods for objects of class cca

Usage

## S3 method for class 'cca'
summary(object, ...)

Arguments

object

An object of the class "cca"

...

not used

Value

it shows the summary of the prediction tree

S3 methods for kunfolding

Description

Summary methods for objects of class kunfolding

Usage

## S3 method for class 'kunfolding'
summary(object, ...)

Arguments

object

An object of the class "kunfolding"

...

not used

Value

it shows the summary of the Kemeny equivalent augmented unfolding

S3 methods for ranktree

Description

Summary methods for objects of class ranktree

Usage

## S3 method for class 'ranktree'
summary(object, ...)

Arguments

object

An object of the class "ranktree"

...

not used

Value

it shows the summary of the prediction tree

Examples


data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
summary(tree)

Path of a terminal node

Description

Given an object of the class "ranktree", it extracts the paths of all terminal nodes

Usage

treepaths(Tree)

Arguments

Tree

An object of the class "ranktree"

Value

A list containing:

leaves		the number of the terminal nodes
size		the sample size within each terminal nodes
paths		a list containing all the paths

Author(s)

Antonio D'Ambrosio antdambr@unina.it

Examples


data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)
#get information about all the paths leading to terminal nodes
paths <- treepaths(tree)
#
#the terminal nodes
paths$leaves
#
#sample size within each terminal node
paths$size
#
#visualize the path of the second leave (terminal node number 8)
paths$paths[[2]]

Validation of the tree for preference rankings

Description

Validation of the tree either with a test set procedure or with v-fold cross validation

Usage

validatetree(
  Tree,
  testX = NULL,
  testY = NULL,
  method = "test",
  V = 5,
  plotting = TRUE
)

Arguments

Tree

An object of the class "ranktree" coming form the function ranktree

testX

The data frame containing the test set (predictors)

testY

The matrix obtaining the test set (response)

method

One between "test" (default) or "cv"

V

The cross-validation parameter. Default V=5

plotting

With the default option plotting=TRUE, the pruning sequence plot is visualized

Value

A list containing:

tau		the Tau_x rank correlation coefficient of the sequence of the trees
error		the error of the sequence of the trees
termnodes		the number of terminal nodes of the sequence of the trees
best_tau		the best tree in terms of Tau_x rank correlation coefficient
best_error		the best tree in terms of error (it is the same)
validation		information about the validation procedure

Author(s)

Antonio D'Ambrosio antdambr@unina.it

Examples


data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
#test set validation
vtreetest <- validatetree(tree,testX=EVS$predictors[-training,],EVS$rankings[-training,]) 
#cross-validation
vtreecv <- validatetree(tree,method="cv",V=10)

Tau_x rank correlation coefficient for vectors

Description

Tau_x rank correlation coefficient for large vectors

Usage

vecTaux(X, Y)

Arguments

X

A vector of length n

Y

A vector of length n

Value

The tau_x rank correlation coefficient between the 2 vectors

European Values Studies (EVS) data

Description

Usage

Format

Details

Source

References

Examples

Irish Election data set

Description

Usage

Format

Details

Source

References

Examples

University rankings dataset.

Description

Usage

Format

Details

Source

References

Examples

Kemeny-equivalent augmented dissimilarity matrix

Description

Usage

Arguments

Details

Value

Author(s)

References

K-Median Cluster Component Analysis

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Utility function

Description

Usage

Arguments

Value

Author(s)

See Also

Normalized Degree of Concordance (NDC) and Adjusted Concordance Index (ACI)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Determine a tree from the main tree-based structure

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Kemeny-equivalent augmented unfolding

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Utility function

Description

Usage