Effect of ordinal variable transformations on hierarchical clustering results: A case study on the Big Data phenomenon
Hana Řezanková, Richard Novák
Available Online October 2019.
- https://doi.org/10.2991/amse-19.2019.9How to use a DOI?
- cluster analysis, ordinal variables, hierarchical clustering, data transformation, distance measures, Big Data phenomenon, New Digital Divide
- The aim of the paper is to show some possible transformations of ordinal variables in cluster analysis and discuss their effect on hierarchical clustering results. Although several papers comparing different approaches to clustering objects characterized by ordinal variables have been published, the comparisons are not complete and include also variables other than ordinal variables (e.g. nominal variables). The following possibilities are considered in this paper to capture ordinal variables in clustering: “original” values (from one to the number of categories), standardized values, transformed values based on the range, ranks of the original values (averaged in case of ties), standardized ranks, and transformed ranks based on the range (usually recommended). The results of the complete linkage method obtained by the Manhattan and Euclidean distances for different numbers of clusters are compared. Moreover, these results are compared with the results obtained by the TwoStep algorithm. The case study is based on the answers of 481 respondents concerning the awareness of problems related to the “Big Data Phenomenon” and “New Digital Divide”.
- Open Access
- This is an open access article distributed under the CC BY-NC license.
Cite this article
TY - CONF AU - Hana Řezanková AU - Richard Novák PY - 2019/10 DA - 2019/10 TI - Effect of ordinal variable transformations on hierarchical clustering results: A case study on the Big Data phenomenon PB - Atlantis Press SP - 81 EP - 90 SN - 2589-6644 UR - https://doi.org/10.2991/amse-19.2019.9 DO - https://doi.org/10.2991/amse-19.2019.9 ID - Řezanková2019/10 ER -