Type: | Package |
Title: | Classification and Clustering of Preference Rankings |
Version: | 1.0.2 |
Date: | 2025-04-15 |
Maintainer: | Antonio D'Ambrosio <antdambr@unina.it> |
Depends: | ConsRank |
Imports: | janitor, methods, pracma, rlist, proxy, smacof, gtools |
Description: | Tree-based classification and soft-clustering method for preference rankings, with tools for external validation of fuzzy clustering, and Kemeny-equivalent augmented unfolding. It contains the recursive partitioning algorithm for preference rankings, non-parametric tree-based method for a matrix of preference rankings as a response variable. It contains also the distribution-free soft clustering method for preference rankings, namely the K-median cluster component analysis (CCA). The package depends on the 'ConsRank' R package. Options for validate the tree-based method are both test-set procedure and V-fold cross validation. The package contains the routines to compute the adjusted concordance index (a fuzzy version of the adjusted rand index) and the normalized degree of concordance (the corresponding fuzzy version of the rand index). The package also contains routines to perform the Kemeny-equivalent augmented unfolding. The mds endine is the function 'sacofSym' from the package 'smacof'. Essential references: D'Ambrosio, A., Vera, J.F., and Heiser, W.J. (2021) <doi:10.1080/00273171.2021.1899892>; D'Ambrosio, A., Amodio, S., Iorio, C., Pandolfo, G., and Siciliano, R. (2021) <doi:10.1007/s00357-020-09367-0>; D'Ambrosio, A., and Heiser, W.J. (2019) <doi:10.1007/s41237-018-0069-5>; D'Ambrosio, A., and Heiser W.J. (2016) <doi:10.1007/s11336-016-9505-1>; Hullermeier, E., Rifqi, M., Henzgen, S., and Senge, R. (2012) <doi:10.1109/TFUZZ.2011.2179303>; Marden, J.J. <ISBN:0412995212>. |
License: | GPL-3 |
Encoding: | UTF-8 |
URL: | https://www.r-project.org/ |
Repository: | CRAN |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-04-15 15:23:55 UTC; Antonio |
Author: | Antonio D'Ambrosio [aut, cre] |
Date/Publication: | 2025-04-17 07:40:09 UTC |
European Values Studies (EVS) data
Description
Random sub-sample of 3584 cases of the survey conducted in 1999 in 32 countries analyzed by Vermunt (2003).
Usage
data("EVS")
Format
The format is: List of 3
$ data:'data.frame': 1911 obs. of 11 variables:
country, gender ,yearbird, mstatus (marital status), eduage (age of education completion), employment (Employment status: ordinal scale 1-8), householdinc (Household income: ordinal scale 1-10), A (Maintain order in Nation),Give people more say in Government decisions, (C) Fight rising prices, (D) Protect freedom of speech.
$ predictors:'data.frame' with all the predictors
$ rankings : matrix with the preferencres for "A" (Maintain order in Nation), "B" (Give people more say in Government decisions), "C" (Fight rising prices), "D" (Protect freedom of speech).
Details
Rankings were obtained by applying the post-materialism scale developed by Inglehart (1977). The scale is based upon an experiment of the type “pick 2 out of 4” most important political goals for your Governments. For this reason, replace the 'NA's with 3 before using the rankings with codes 'ranktree' or 'cca' (see D'Ambrosio and Heiser, 2016). About the predictors, the coding of the Countries are: G1 (Austria, Denmark, Netherlands, Sweden), G2 (Belgium, Croatia, France, Greece, Ireland, Northern Ireland, Spain), G3 (Bulgaria, Czechnia, East, Germany, Finland, Iceland, Luxembourg, Malta, Portugal, Romania, Slovenia, West Germany), G4 (Belarus, Estonia, Hungary, Latvia, Lithuania, Poland, Russia, Slovakia, Ukraine). Coding of predictor "mstatus" are: mar (married), wid (widowed), div (divorced), sep (separated), nevm (never married).
Source
http://statisticalinnovations.com/technicalsupport/choice_datasets.html
References
Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology, 33(1), 213–239.
Inglehart, R. (1977). The silent revolution: Changing values and political styles among Western Publics. Princeton, NJ: Princeton University Press.
D'Ambrosio, A., and Heiser W.J. (2016). A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika, vol. 81 (3), pp.774-94.
Examples
data(EVS)
# EVS$rankings[is.na(EVS$rankings)] <- 3 #place unranked objects in a tie to the third position
# ccares <- cca(EVS$rankings,4) #solution with 4 components
Irish Election data set
Description
An opinion poll conducted by Irish Marketing Surveys one month prior to the election in 1997. Interviews were conducted on about 1100 respondents, drawn from 100 sampling areas. Interviews took place at randomly located homes, with respondents selected according to a socioeconomic quota. A range of sociological questions was asked of each respondent, as was their voting preference, if any, for each of the candidates.
Usage
data("Irish")
Format
The format is: List of 3
$ IrishElection: 'data.frame': 1083 obs. of 11 variables: Gender (male, housewife, nonhousewife), marital status (single, married, separated), age, socialclass (five unordered categories), Area (rural, city, town), government satisfaction (no opinion,m satisfied, dissatisfied), Bano , Roch, McAl, Nall, Scal
$ predictors :'data.frame' with all the predictors
$ rankings : matrix with the preferencres for "Bano" "Roch" "McAl" "Nall"
Details
In the original version of the data, the ranking matrix contains NAs. Here, NAs are replaced with the number 7, to indicate that all the non-stated preferences are in a tie at the last position (see D'Ambrosio and Heiser, 2016). For details about the data set see Gormley and Murphy, 2008.
Source
https://projecteuclid.org/journals/annals-of-applied-statistics/volume-2/issue-4/A-mixture-of-experts-model-for-rank-data-with/10.1214/08-AOAS178.full?tab=ArticleLinkSupplemental
References
Gormley, I.C., and Murphy, T.B. (2008). A mixture of experts model for rank data withapplications in election studies. Annals of Applied Statistics 2(4): 1452-1477. DOI: 10.1214/08-AOAS178
D'Ambrosio, A., and Heiser W.J. (2016). A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika, vol. 81 (3), pp.774-94. DOI: 10.1007/s11336-016-9505-1.
Examples
data(Irish)
University rankings dataset.
Description
University rankings dataset was analysed by Dittrich, Hatzinger and Katzenbeisser (1998) to investigate paired comparison data concerning European universities and student's characteristics with the goal to show that university rankings are different for different groups of students. Here both raw data (with paired comparisons) and the version with rankings are preesented (see details). A survey of 303 students studying at the Vienna University of Economics was carried out to examine the student's preference of six universities, namely London, Paris, Milan, St. Gallen, Barcelona and Stockholm. The data set contains 23 variables. The first 15 digits in each row indicate the preferences of a student. For a given comparison, responses were coded by 1 if the first preference was preferred, by 2 if the second university was preferred, and by 3 if universities are tied. All rows containing missing ranked Universities were skipped.
Usage
data("Univranks")
Format
The format is: List of 3
$ rawdata: 'data.frame': 212 obs. of 23 variables: the first 15 are the paired comparisons coded as follows: (1: the first is preferred to the second; 2: the second is preferred to the fisrt; 3 tied)
$ LP : comparison of London to Paris
$ LM : comparison of London to Milan
$ PM : comparison of London to Milan
$ LSg : comparison of London to St. Gallen
$ PSg : comparison of Paris to St. Gallen
$ MSg : comparison of Milan to St. Gallen
$ LB : comparison of London to Barcelona
$ PB : comparison of Paris to Barcelona
$ MB : comparison of Milan to Barcelona
$ SgB : comparison of St. Gallen to Barcelona
$ LSt : comparison of London to Stockholm
$ PSt : comparison of Paris to Stockholm
$ MSt : comparison of Milan to Stockholm
$ SgSt: comparison of St. Gallen to Stockholm
$ BSt : comparison of Barcelona to Stockholm
$ Stud: Factor w/ 2 levels "commerce","other"
$ Eng : Factor w/ 2 levels "good","poor""
$ Fra : Factor w/ 2 levels "good","poor"
$ Spa : Factor w/ 2 levels "good","poor"
$ Ita : Factor w/ 2 levels "good","poor"
$ Wor : Factor w/ 2 levels "no","yes"
$ Deg : Factor w/ 2 levels "no","yes"
$ Sex : Factor w/ 2 levels "female","male"
$ predictors:'data.frame': 212 obs. of 8 variables( the last 8 variables of the "rawdata" dataframe
$ rankings : matrix of preference rankings. The columns are: "L" (London), "P" (Paris), "M" (Milan), "Sg" (St. Gallen), "B" (Barcerlona), "St" (Stockholm)
Details
To obtain the preference rankings from the paired comparisons the procedure has been the following:
the first row of the raw data is [1 3 2 1 2 1 1 2 1 1 1 2 1 1 2]
.
London is preferred to Paris, St. Gallen, Barcelona, Stockholm (LP, LM, LSg, LB and LSt are always equal to 1),
and there is no preference between London and Milan (they are tied);
Milan is preferred to Paris (PM = 2), St. Gallen, Barcelona and Stockholm; and so on.
The first ordering is then <{L M} Sg St B P>
corresponding to a ranking [1,5,1,2,4,3]
,
where the columns indicate L P M Sg B St
.
Source
http://www.blackwellpublishers.co.uk/rss
References
Dittrich, R., Hatzinger, R., and Katzenbeisser, W. (1998). Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47(4), 511-525. DOI: 10.1111/1467-9876.00125
D'Ambrosio, A. (2008). Tree based methods for data editing and preference rankings. Ph.D. thesis, University of Naples Federico II. https://www.doi.org/10.6092/UNINA/FEDOA/2746
Examples
data(Univranks)
Kemeny-equivalent augmented dissimilarity matrix
Description
Kemeny-equivalent augmented dissimilarity matrix
Usage
augmatrix(X)
Arguments
X |
A n by m data matrix, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns. |
Details
First the matrix is transformed with the tau_X rank correlation coeficient, then it is normalized. The output contains:
Delta | the augmented dissimilarity matrix | ||
Interaction | the submatrix containnig the interactions individuals-items | ||
Objects | the submatrix containing the within-items proximities | ||
Indiv | the submatrix containing the within-individuals proximities | ||
beta | the beta parameter | ||
alpha | the alpha parameter | ||
csi | the csi parameter | ||
res | the resume of th eaugmentation in terms of: | ||
TauX | tau_x rank correlation coefficient | ||
Kendall | kendall rank correlation coefficient | ||
Spearman | Spearman correlation coefficient |
Value
A list containing the dissimilarity matrix and othe information about the augmented matrix. See details for detailed information.
Author(s)
Antonio D'Ambrosio antdambr@unina.it
References
D'Ambrosio, A., Vera, J. F., & Heiser, W. J. (2022). Avoiding degeneracies in ordinal unfolding using Kemeny-equivalent dissimilarities for two-way two-mode preference rank data. Multivariate Behavioral Research, 57(4), 679-699.
K-Median Cluster Component Analysis
Description
K-Median Cluster Component Analysis, a distribution-free soft-clustering method for preference rankings.
Usage
cca(X, k, control = ccacontrol(...), ...)
Arguments
X |
A n by m data matrix containing preference rankings, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns. |
k |
The number of cluster components |
control |
a list of options that control details of the |
... |
arguments passed bypassing |
Details
The user can use any algorithm implemented in the consrank
function from the ConsRank package. All algorithms allow the user to set the option 'full=TRUE'
if the median ranking(s) must be searched in the restricted space of permutations instead of in the unconstrained universe of rankings of n items including all possible ties.
There are two classification uncertainty measures: Us and Uprods. "Us" is the geometric
mean of the membership probabilities of each individual, normalized in such a way that
in the case of maximum uncertainty Us=1. "Ucca" is the average of all the "Us".
"Uprods" is the product of the membership probabilities of each individual, normalized in such a way that
in the case of maximum uncertainty Uprods=1. "Uprodscca" is the average of all the "Uprods".
Value
An object of the class "cca". It contains:
pk | the membership probability matrix | |
clc | cluster centers | |
oclc | cluster centers in terms of orderings | |
idc | crisp partition: id of the cluster component associated with the highest membership probability | |
Hcca | Global homogeneity measure (tau_X rank correlation coefficient) | |
hk | Homogeneity within cluster | |
props | estimated proportion of cases within cluster | |
Us | Uncertainty measure per-individual (see details) | |
Ucca | Global uncertainty measure | |
Uprods | Uncertainty measure per-individual (see details) | |
Uprodscca | Global uncertainty measure | |
consrankout | complete output of rank aggregation algorithm, containing eventually multiple median rankings |
Author(s)
Antonio D'Ambrosio antdambr@unina.it
References
D'Ambrosio, A. and Heiser, W.J. (2019). A Distribution-free Soft Clustering Method for Preference Rankings. Behaviormetrika , vol. 46(2), pp. 333–351, DOI: 10.1007/s41237-018-0069-5
Heiser W.J., and D'Ambrosio A. (2013). Clustering and Prediction of Rankings within a Kemeny Distance Framework. In Berthold, L., Van den Poel, D, Ultsch, A. (eds). Algorithms from and for Nature and Life.pp-19-31. Springer international. DOI: 10.1007/978-3-319-00035-0_2.
Ben-Israel, A., and Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), pp.5-26. DOI: 10.1007/s00357-008-9002-z
See Also
ccacontrol
ranktree
Examples
data(Irish)
set.seed(135) #for reproducibility
# CCA with four components
ccares <- cca(Irish$rankings, 4, itercca=10)
summary(ccares)
Utility function
Description
Utility function to use to set the control arguments of cca
Usage
ccacontrol(
algorithm = "quick",
full = FALSE,
itercca = 1,
consrankitermax = 10,
np = 15,
gl = 100,
ff = 0.4,
cr = 0.9,
proc = FALSE,
ps = FALSE
)
Arguments
algorithm |
The algorithm used to compute the median ranking. One among"BB", "quick" (default), "fast" and "decor" |
full |
Specifies if the median ranking must be searched in the universe of rankings including all the possible ties. Default: FALSE |
itercca |
Number of iterations of cca |
consrankitermax |
Number of iterations for "fast" and "decor" algorithms. itermax=10 is the default option. |
np |
(for "decor" only) the number of population individuals. np=15 is the default option. |
gl |
(for"decor" only) generations limit, maximum number of consecutive generations without improvement. gl=100 is the default option. |
ff |
(for"decor" only) the scaling rate for mutation. Must be in [0,1]. ff=0.4 is the default option. |
cr |
(for"decor" only) the crossover range. Must be in [0,1]. cr=0.9 is the default option. |
proc |
(for "BB" only) proc=TRUE allows the branch and bound algorithm to work in difficult cases, i.e. when the number of objects is larger than 15 or 25. proc=FALSE is the default option |
ps |
If PS=TRUE, on the screen some information about how many branches are processed are displayed. Default value: FALSE |
Value
A list containing all the control parameters
Author(s)
Antonio D'Ambrosio antdambr@unina.it
See Also
Normalized Degree of Concordance (NDC) and Adjusted Concordance Index (ACI)
Description
Given two fuzzy (Ruspini) partitions, it compute the NDC and the ACI. NDC is the fuzzy version of the Rand Index, as well as ACI is the fuzzy version of the Adjusted Rand Index
Usage
fuzzyconcordance(P, Q, nperms = 1000)
Arguments
P |
A fuzzy partition. It has to be a matrix with n rows and k columns. Each column is expression of the degree of membership of the i-th row over the k partitions (see details). |
Q |
A fuzzy partition. It has to be a matrix with n rows and h columns. Each column is expression of the degree of membership of the i-th row over the h partitions (see details). |
nperms |
number of permutations necessary to compute ACI. Default: 1000 |
Details
Both P and Q, or only one of those, can be crisp (or hard) partitions. In this case, each row must contain either 0 or 1, and the sum of the i-th row must be 1. In other words, either P or Q (or both) are expressed in terms of dummy coding. If both partitions are crisp, then NDC is equal to Rand Index and ACI is equal to Adjusted Rand Index. This function can be used to externally validate the output of any fuzzy clustering method
Value
A list containing:
ACI | the Adjusted Concordance Index | |
NDC | the Normalized Degree of Concordance |
Author(s)
Antonio D'Ambrosio antdambr@unina.it
References
D’Ambrosio, A., Amodio, S., Iorio, C., Pandolfo, G. and Siciliano, R. (2021). Adjusted Concordance Index: an Extension of the Adjusted Rand Index to Fuzzy Partitions. Journal of Classification vol. 38(1), pp. 112–128 (2021). DOI: 10.1007/s00357-020-09367-0
Hullermeier, E., Rifqi, M., Henzgen, S., and Senge, R. (2012). Comparing fuzzy partitions: a generalization of the Rand index and related measures. IEEE Transactions on Fuzzy Systems, 20(3), 546–556. DOI: 10.1109/TFUZZ.2011.2179303
See Also
Examples
#two random fuzzy partitions
P = rbind(c(0.5259, 0.1656, 0.3085),
c(0.5623, 0.1036, 0.3341),
c(0.2508, 0.1849, 0.5643),
c(0.5654, 0.1934, 0.2413),
c(0.4529, 0.1679, 0.3792),
c(0.2390, 0.1758, 0.5852),
c(0.3114, 0.1743, 0.5143),
c(0.4188, 0.1392, 0.4420),
c(0.5830, 0.1655, 0.2514),
c(0.5860, 0.1171, 0.2969),
c(0.2630, 0.1706, 0.5664),
c(0.5882, 0.1032, 0.3086),
c(0.5829, 0.1277, 0.2894),
c(0.3942, 0.1046, 0.5012),
c(0.5201, 0.1097, 0.3702),
c(0.2568, 0.1823, 0.5609),
c(0.3687, 0.1695, 0.4618),
c(0.5663, 0.1317, 0.3020),
c(0.5169, 0.1950, 0.2881),
c(0.5838, 0.1034, 0.3128))
Q = rbind(c(0.4494, 0.3755, 0.1751),
c(0.5219, 0.3526, 0.1255),
c(0.3432, 0.5062, 0.1506),
c(0.3120, 0.5181, 0.1699),
c(0.5362, 0.2747, 0.1891),
c(0.4082, 0.3959, 0.1959),
c(0.4670, 0.3782, 0.1547),
c(0.4276, 0.4585, 0.1139),
c(0.4013, 0.4837, 0.1149),
c(0.3724, 0.5019, 0.1258),
c(0.5055, 0.3104, 0.1841),
c(0.4027, 0.4719, 0.1254),
c(0.3565, 0.4620, 0.1814),
c(0.6106, 0.2650, 0.1244),
c(0.5595, 0.2476, 0.1929),
c(0.4657, 0.3993, 0.1350),
c(0.2964, 0.5839, 0.1197),
c(0.5387, 0.3362, 0.1251),
c(0.4043, 0.4341, 0.1616),
c(0.5631, 0.2895, 0.1473))
ci <- fuzzyconcordance(P,Q)
#generate a random fuzzy partition with two components (clusters)
Q2 <- matrix(runif(20),ncol=1)
Q2 <- cbind(Q2,1-Q2)
ci2 <- fuzzyconcordance(P,Q2)
#generate a random crisp partition
P2 <- t(rmultinom(20,1,c(0.3,0.3,0.4)))
ci3 <- fuzzyconcordance(P2,Q)
#--------------------
## Not run:
# install.packages("Rankcluster")
library("Rankcluster") # model-based clustering algorithm for
# ranking data by Biernacki and Jacques (2013)
# <doi:10.1016/j.csda.2012.08.008>
data(APA)
set.seed(136) #for reproducibility
rcres <- rankclust(APA$data,K=3) # solution with 3 centers, it takes about 75 seconds
##
ccares <- cca(APA$data,k=3) #solution with 3 components, it takes about 7 seconds
##
ci <- fuzzyconcordance(rcres[3]@tik,ccares$pk)
ci$ACI # 0.0226 means that the two partitions are similar (see NDC below),
# but their similarity is mainly due to chance
ci$NDC
## End(Not run)
Determine a tree from the main tree-based structure
Description
Given a tree belonging to the class "ranktree", determine a subtree with a given number of terminal nodes
Usage
getsubtree(Tree, cut, tokeep = NULL)
Arguments
Tree |
An object of the class "ranktree" coming form te function |
cut |
The maximum number of terminal nodes that the Tree must have |
tokeep |
parameter invoked by other internal functions |
Details
If the pruning sequence returns a series of subtrees with, say, 1,2,4,7,9 terminal nodes and the user set cut=8, the function extract the subtree with 7 terminal nodes.
Value
An object of the class "ranktree", containing the same information of the output of the function ranktree
Author(s)
Antonio D'Ambrosio antdambr@unina.it
Examples
data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
#see how many terminal nodes have the trees compomimg the nested sequence of subtrees
infoprun <- tree$pruneinfo$termnodes
#select the tree with, say, 6 terminal nodes
tree6 <- getsubtree(tree,6)
Kemeny-equivalent augmented unfolding
Description
Kemeny-equivalent augmented unfolding.
Usage
kunfolding(X, p = 2, control = mdscontrol(...), ...)
Arguments
X |
A n by m data matrix, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns. |
p |
the dimensionality of the solution. Default p=2 |
control |
a list of options that control details of the |
... |
arguments passed bypassing |
Details
The MDS engine is smacofsym from smacof
.In a future release other mds algorithms will be implemented.
The output consists in a object of the class "kunfolding". It contains:
rawstress | raw stress | ||
nrawstress | normalized raw stress | ||
stress1 | Stress-1 | ||
rowcoord | row (individuals) coordinates | ||
colcord | column (items) coordinates | ||
dhat | dhat | ||
dij | configuration distance | ||
shepardD | DeSarbo I Index | ||
kendallfit | Kendall tau_b between transformed and fitted proximities | ||
tauxfit | Tau_X between transformed and fitted proximities | ||
avgrecov | Averaged recovery measure between raw preference data and fitted proximities | ||
avgedpearson | Averaged Pearson correlation between raw preference data and fitted proximities | ||
avgspearman | Averaged Spearman rho between raw preference data and fitted proximities | ||
avgkendall | Averaged Kendall taub between raw preference data and fitted proximities | ||
avgtaux | Averaged Tau_X between raw preference data and fitted proximities | ||
resume | Resume meausures | ||
resumerec | tab | Resume of recovery measures | |
resumeaug | Resume of augmentation matrix | ||
kDelta | Kemeny equivalent dissimilarity matrix | ||
beta | beta parameter | ||
alpha | alpha parameter | ||
interactions | n x m interaction submatrix | ||
csi | csi parameter | ||
mdssol | mds solution as returned by smacof package |
||
n_i | number of individuals | ||
n_c | number of items | ||
tots | total | ||
model | mds model | ||
transf | transformation used |
Value
An object of the class kunfolding. See details for detailed information.
Author(s)
Antonio D'Ambrosio antdambr@unina.it
References
D'Ambrosio, A., Vera, J. F., & Heiser, W. J. (2022). Avoiding degeneracies in ordinal unfolding using Kemeny-equivalent dissimilarities for two-way two-mode preference rank data. Multivariate Behavioral Research, 57(4), 679-699.
See Also
augmatrix
Examples
data("breakfast", package="smacof")
unfout <- kunfolding(breakfast)
itemsl <- colnames(breakfast)
plot(unfout,labs=itemsl)
Utility function
Description
A utility function completing the output of the function ranktree
.
Usage
layouttree(Tree)
Arguments
Tree |
an object of the class "ranktree" |
Value
an object of the class "ranktree" completing the output of the function ranktree
Author(s)
Antonio D'Ambrosio antdambr@unina.it
Utility function
Description
Utility function to use to set the control arguments of kunfolding
Usage
mdscontrol(
model = "ordinal",
init = "torgerson",
transf = "primary",
userinit = NULL,
w = NULL,
minstress = 1e-05,
itermax = 500,
printscr = TRUE,
spline.degree = 2,
spline.intKnots = 2,
relax = FALSE,
modulus = 1
)
Arguments
model |
Specifies MDS model. One among "ordnal (default)" or "metric" |
init |
Initial configuration. One among "torgerson" (degault), "random" or "user" |
transf |
The transformation. One among "primary" (default), "secondary","tertiary","spline","ratio","interval","none" |
userinit |
The user initial configuration if "init" has been set as "user" |
w |
The set of weigths. Default: NULL |
minstress |
the minimum stress (for stress method). Default 1e-5 |
itermax |
Maximum number if iterations. Default 500 |
printscr |
Display the summary of the model. Default TRUE |
spline.degree |
Degree of spline transformation. Default 2 |
spline.intKnots |
Interior knots. default 2 |
relax |
Relax the solution. Default FALSE |
modulus |
Modulus. Default 1 |
Value
A list containing all the control parameters
Author(s)
Antonio D'Ambrosio antdambr@unina.it
See Also
Path of a terminal node
Description
Given an object of the class "ranktree", it visualize the path leading to the terminal node
Usage
nodepath(termnode, Tree)
Arguments
termnode |
The terminal node of which the path has to be extracted |
Tree |
An object of the class "ranktree" |
Value
The path leading to the terminal node
Author(s)
Antonio D'Ambrosio antdambr@unina.it
See Also
ranktree
, treepaths
, getsubtree
Examples
data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)
#get information about all the paths leading to terminal nodes
paths <- treepaths(tree)
#see the path for terminal node number 8
nodepath(termnode=8,tree)
Plot Kemeny equivalent augmented unfolding solution
Description
Plot the Kemeny equivalent augmented unfolding coming from kunfolding
Usage
## S3 method for class 'kunfolding'
plot(
x,
labs = NULL,
labsrow = NULL,
main = NULL,
cols = NULL,
cexind = 1,
cexitems = 1,
pchcol = 15,
...
)
Arguments
x |
An object of the class "kunfolding" |
labs |
The labels of the items. Defalut is NULL. If not provided, a sequence "o1,...,on" is printed, with n=number of items |
labsrow |
The labels of the individuals. Defalut is NULL. If not provided, a sequence "1,...,m" is printed, with m=number of individuals |
main |
Main title of the plot. Default NULL |
cols |
Color of the individuals. It must be numeric. Default is NULL (dark gray). |
cexind |
cex of the individuals. Default 1 |
cexitems |
cex of the items. defaul 1 |
pchcol |
pch parameter for items points. Default 15 |
... |
System reserved (No specific usage) |
Value
the plot of unfolding solution
Author(s)
Antonio D'Ambrosio antdambr@unina.it
See Also
Examples
data("breakfast", package="smacof")
unfout <- kunfolding(breakfast)
itemsl <- colnames(breakfast)
plot(unfout,labs=itemsl,cexitems=0.8)
Plot tree-based structure or pruning sequence of ranktree
Description
Plot the tree coming from the ranktree
or the pruning sequence of the ranktree
Usage
## S3 method for class 'ranktree'
plot(
x,
plot.type = "tree",
dispclass = FALSE,
valtree = NULL,
taos = TRUE,
...
)
Arguments
x |
An object of the class "ranktree" |
plot.type |
One among "tree" or "pruningseq" |
dispclass |
Display the median ranking above terminal nodes. Default option: FALSE |
valtree |
If plot.type="pruningseq", it shows the Tau_x rank correlation coefficient or the error along the pruning sequence on the training set. If valtree is the output of the function |
taos |
If plot.type="pruningseq", it plots the Tau_x rank correlation coefficient along the pruning sequence. If taos=FALSE, it plots the error. |
... |
System reserved (No specific usage) |
Value
the plot of either the tree or the pruning sequence
Author(s)
Antonio D'Ambrosio antdambr@unina.it
See Also
Examples
data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
plot(tree,dispclass=TRUE)
data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
plot(tree,dispclass=TRUE)
#test set validation
vtreetest <- validatetree(tree,testX=EVS$predictors[-training,],EVS$rankings[-training,])
dtree <- getsubtree(tree,vtreetest$best_tau)
plot(dtree,dispclass=TRUE)
#see the global weigthted tau_X rank correlation coefficients
plot(tree,plot.type="pruningseq",valtree=vtreetest)
#see the error rates
plot(tree,plot.type="pruningseq",valtree=vtreetest, taos=FALSE)
Predict the median rankings for new observations
Description
Predict the median rankings in a tree-based structure built with ranktree
for new observations
Usage
## S3 method for class 'ranktree'
predict(object, newx, ...)
Arguments
object |
An object of the class "ranktree" |
newx |
A dataframe of the same nature of the predictor dataframe with which the tree has been built |
... |
System reserved (No specific usage) |
Value
A list containing:
rankings | the fit in terms of rankings | |
orderings | the fit in terms of orderings | |
info | dataframe containing the terminal nodes in which the new x fall down, then the new x and the fit (in terms of rankings) |
Author(s)
Antonio D'Ambrosio antdambr@unina.it
See Also
Examples
data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
#use the function predict ro predict rankings for new predictors
rankfit <- predict(tree,newx=EVS$predictors[-training,])
#fit in terms of rankings
rankfit$rankings
#fit in terms of orderings
rankfit$orderings
# information about the fit (terminal node, predictor and fit (in terms of rankings))
rankfit$info
S3 methods for cca
Description
Print methods for objects of class cca
Usage
## S3 method for class 'cca'
print(x, ...)
Arguments
x |
An object of the class "cca" |
... |
not used |
Value
print a brief summary of the CCA
S3 methods for kunfolding
Description
Print methods for objects of class kunfolding
Usage
## S3 method for class 'kunfolding'
print(x, ...)
Arguments
x |
An object of the class "kunfolding" |
... |
not used |
Value
print a brief summary of the Kemeny equivalent augmented unfolding
S3 methods for ranktree
Description
Print methods for objects of class ranktree
Usage
## S3 method for class 'ranktree'
print(x, ...)
Arguments
x |
An object of the class "ranktree" |
... |
not used |
Value
print a brief summary of the prediction tree
Examples
data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
tree
Recursive partitioning method for the prediction of preference rankings based upon Kemeny distances
Description
Recursive partitioning method for the prediction of preference rankings based upon Kemeny distances.
Usage
ranktree(Y, X, prunplot = FALSE, control = ranktreecontrol(...), ...)
Arguments
Y |
A n by m data matrix, in which there are n judges and m objects to be judged. Each row is a ranking of the objects which are represented by the columns. |
X |
A dataframe containing the predictor, that must have n rows. |
prunplot |
prunplot=TRUE returns the plot of the pruning sequence. Default value: FALSE |
control |
a list of options that control details of the |
... |
arguments passed bypassing |
Details
The user can use any algorithm implemented in the consrank
function from the ConsRank package. All algorithms allow the user to set the option 'full=TRUE'
if the median ranking(s) must be searched in the restricted space of permutations instead of in the unconstrained universe of rankings of n items including all possible ties.
The output consists in a object of the class "ranktree". It contains:
X | the predictors: it must be a dataframe | ||
Y | the response variable: the matrix of the rankings | ||
node | a list containing teh tree-based structure: | ||
number | node number | ||
terminal | logical: TRUE is terminal node | ||
father | father node number of the current node | ||
idfather | id of the father node of the current node | ||
size | sample size within node | ||
impur | impurity at node | ||
wimpur | weighted impurity at node | ||
idatnode | id of the observations within node | ||
class | median ranking within node in terms of orderings | ||
nclass | median ranking within node in terms of rankings | ||
mclass | eventual multiple median rankings | ||
tau | Tau_x rank correlation coefficient at node | ||
wtau | weighted Tau_x rank correlation coefficient at node | ||
error | error at node | ||
werror | weighted error at node | ||
varsplit | variables generating split | ||
varsplitid | id of variables generating split | ||
cutspli | splitting point | ||
children | children nodes generated by current node | ||
idchildren | id of children nodes generated by current node | ||
... | other info about node | ||
control | parameters used to build the tree | ||
numnodes | number of nodes of the tree | ||
tsynt | list containing the synthesis of the tree: | ||
children | list containing all information about leaves | ||
parents | list containing all information about parent nodes | ||
geneaoly | data frame containing information about all nodes | ||
idgenealogy | data frame containing information about all nodes in terms of nodes id | ||
idparents | id of the parents of all the nodes | ||
goodness | goodness -and badness- of fit measures of the tree: Tau_X, error, impurity | ||
nomin | information about nature of the predictors | ||
alpha | alpha parameter for pruning sequence | ||
pruneinfo | list containing information about the pruning sequence: | ||
prunelist | information about the pruning | ||
tau | tau_X rank correlation coefficient of each subtree | ||
error | error of each subtree | ||
termnodes | number of terminal nodes of each subtree | ||
subtrees | list of each subtree created with the cost-complexity pruning procedure |
Value
An object of the class ranktree. See details for detailed information.
Author(s)
Antonio D'Ambrosio antdambr@unina.it
References
D'Ambrosio, A., and Heiser W.J. (2016). A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika, vol. 81 (3), pp.774-94.
See Also
ranktreecontrol
, plot.ranktree
, summary.ranktree
, getsubtree
, validatetree
, treepaths
, nodepath
Examples
data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)
#plot the tree
plot(tree,dispclass=TRUE)
#visualize information
summary(tree)
#get information about the paths leading to terminal nodes (all the paths)
infopaths <- treepaths(tree)
#the terminal nodes
infopaths$leaves
#sample size within each terminal node
infopaths$size
#visualize the path of the second leave (terminal node number 8)
infopaths$paths[[2]]
#alternatively
nodepath(termnode=8,tree)
set.seed(132) #for reproducibility
#validation of the tree via v-fold cross-validation (default value of V=5)
vtree <- validatetree(tree,method="cv")
#extract the "best" tree
dtree <- getsubtree(tree,vtree$best_tau)
summary(dtree)
#plot the validated tree
plot(dtree,dispclass=TRUE)
#predicted rankings
rankfit <- predict(dtree,newx=Irish$predictors)
#fit of rankings
rankfit$rankings
#fit in terms of orderings
rankfit$orderings
#all info about the fit (id og the leaf, predictor values, and fit)
rankfit$orderings
Utility function
Description
Utility function to use to set the control arguments of ranktree
Usage
ranktreecontrol(
num = NULL,
decrmin = 0.01,
algorithm = "quick",
full = FALSE,
itermax = 10,
np = 15,
gl = 100,
ff = 0.4,
cr = 0.9,
proc = FALSE,
ps = FALSE
)
Arguments
num |
The maximum number of observations in a node to be split: default, 10% of the sample size |
decrmin |
Minimum decrease in impurity |
algorithm |
The algorithm used to compute the median ranking. One among"BB", "quick" (default), "fast" and "decor" |
full |
Specifies if the median ranking must be searched in the universe of rankings including all the possible ties. Default: FALSE |
itermax |
Number of iterations for "fast" and "decor" algorithms. itermax=10 is the default option. |
np |
(for "decor" only) the number of population individuals. np=15 is the default option. |
gl |
(for"decor" only) generations limit, maximum number of consecutive generations without improvement. gl=100 is the default option. |
ff |
(for"decor" only) the scaling rate for mutation. Must be in [0,1]. ff=0.4 is the default option. |
cr |
(for"decor" only) the crossover range. Must be in [0,1]. cr=0.9 is the default option. |
proc |
(for "BB" only) proc=TRUE allows the branch and bound algorithm to work in difficult cases, i.e. when the number of objects is larger than 15 or 25. proc=FALSE is the default option |
ps |
If PS=TRUE, on the screen some information about how many branches are processed are displayed. Default value: FALSE |
Value
A list containing all the control parameters
Author(s)
Antonio D'Ambrosio antdambr@unina.it
See Also
S3 methods for ranktree
Description
Summary methods for objects of class cca
Usage
## S3 method for class 'cca'
summary(object, ...)
Arguments
object |
An object of the class "cca" |
... |
not used |
Value
it shows the summary of the prediction tree
S3 methods for kunfolding
Description
Summary methods for objects of class kunfolding
Usage
## S3 method for class 'kunfolding'
summary(object, ...)
Arguments
object |
An object of the class "kunfolding" |
... |
not used |
Value
it shows the summary of the Kemeny equivalent augmented unfolding
S3 methods for ranktree
Description
Summary methods for objects of class ranktree
Usage
## S3 method for class 'ranktree'
summary(object, ...)
Arguments
object |
An object of the class "ranktree" |
... |
not used |
Value
it shows the summary of the prediction tree
Examples
data("Univranks")
tree <- ranktree(Univranks$rankings,Univranks$predictors,num=50)
summary(tree)
Path of a terminal node
Description
Given an object of the class "ranktree", it extracts the paths of all terminal nodes
Usage
treepaths(Tree)
Arguments
Tree |
An object of the class "ranktree" |
Value
A list containing:
leaves | the number of the terminal nodes | |
size | the sample size within each terminal nodes | |
paths | a list containing all the paths |
Author(s)
Antonio D'Ambrosio antdambr@unina.it
See Also
ranktree
, nodepath
, getsubtree
Examples
data(Irish)
#build the tree with default options
tree <- ranktree(Irish$rankings,Irish$predictors)
#get information about all the paths leading to terminal nodes
paths <- treepaths(tree)
#
#the terminal nodes
paths$leaves
#
#sample size within each terminal node
paths$size
#
#visualize the path of the second leave (terminal node number 8)
paths$paths[[2]]
Validation of the tree for preference rankings
Description
Validation of the tree either with a test set procedure or with v-fold cross validation
Usage
validatetree(
Tree,
testX = NULL,
testY = NULL,
method = "test",
V = 5,
plotting = TRUE
)
Arguments
Tree |
An object of the class "ranktree" coming form the function |
testX |
The data frame containing the test set (predictors) |
testY |
The matrix obtaining the test set (response) |
method |
One between "test" (default) or "cv" |
V |
The cross-validation parameter. Default V=5 |
plotting |
With the default option plotting=TRUE, the pruning sequence plot is visualized |
Value
A list containing:
tau | the Tau_x rank correlation coefficient of the sequence of the trees | |
error | the error of the sequence of the trees | |
termnodes | the number of terminal nodes of the sequence of the trees | |
best_tau | the best tree in terms of Tau_x rank correlation coefficient | |
best_error | the best tree in terms of error (it is the same) | |
validation | information about the validation procedure |
Author(s)
Antonio D'Ambrosio antdambr@unina.it
Examples
data(EVS)
EVS$rankings[is.na(EVS$rankings)] <- 3
set.seed(654)
training=sample(1911,1434)
tree <- ranktree(EVS$rankings[training,],EVS$predictors[training,],decrmin=0.001,num=50)
#test set validation
vtreetest <- validatetree(tree,testX=EVS$predictors[-training,],EVS$rankings[-training,])
#cross-validation
vtreecv <- validatetree(tree,method="cv",V=10)
Tau_x rank correlation coefficient for vectors
Description
Tau_x rank correlation coefficient for large vectors
Usage
vecTaux(X, Y)
Arguments
X |
A vector of length n |
Y |
A vector of length n |
Value
The tau_x rank correlation coefficient between the 2 vectors