Comparison of cluster validation indices with missing data
Year of publication
2018
Authors
Niemelä, Marko; Äyrämö, Sami; Kärkkäinen, Tommi
Abstract
Clustering is an unsupervised machine learning technique, which aims to divide a given set of data into subsets. The number of hidden groups in cluster analysis is not always obvious and, for this purpose, various cluster validation indices have been suggested. Recently some studies reviewing validation indices have been provided, but any experiments against missing data are not yet available. In this paper, performance of ten well-known indices on ten synthetic data sets with various ratios of missing values is measured using squared euclidean and city block distances based clustering. The original indices are modified for a city block distance in a novel way. Experiments illustrate the different degree of stability for the indices with respect to the missing data.
Show moreOrganizations and authors
Publication type
Publication format
Article
Parent publication type
Conference
Article type
Other article
Audience
ScientificPeer-reviewed
Peer-ReviewedMINEDU's publication type classification code
A4 Article in conference proceedingsPublication channel information
Parent publication name
Conference
European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
Publisher
Pages
461-466
ISBN
Publication forum
Publication forum level
1
Open access
Open access in the publisher’s service
No
Self-archived
Yes
Other information
Fields of science
Computer and information sciences
Keywords
[object Object],[object Object]
Publication country
Belgium
Internationality of the publisher
International
Language
English
International co-publication
No
Co-publication with a company
No
The publication is included in the Ministry of Education and Culture’s Publication data collection
Yes