Comparison of cluster validation indices with missing data

Year of publication

2018

Authors

Niemelä, Marko; Äyrämö, Sami; Kärkkäinen, Tommi

Abstract

Clustering is an unsupervised machine learning technique, which aims to divide a given set of data into subsets. The number of hidden groups in cluster analysis is not always obvious and, for this purpose, various cluster validation indices have been suggested. Recently some studies reviewing validation indices have been provided, but any experiments against missing data are not yet available. In this paper, performance of ten well-known indices on ten synthetic data sets with various ratios of missing values is measured using squared euclidean and city block distances based clustering. The original indices are modified for a city block distance in a novel way. Experiments illustrate the different degree of stability for the indices with respect to the missing data.

Organizations and authors

University of Jyväskylä

Niemelä Marko

Äyrämö Sami

Kärkkäinen Tommi

Publication type

Publication format

Article

Parent publication type

Conference

Article type

Audience

Scientific

Peer-reviewed

Peer-Reviewed

MINEDU's publication type classification code

A4 Article in conference proceedings

Publication channel information

Pages

461-466

ISBN

978-2-87587-047-6

Publication forum

55877

Publication forum level

Open access

Open access in the publisher’s service

Self-archived

Yes

Other information

Fields of science

Computer and information sciences

Keywords

[object Object],[object Object]

Publication country

Belgium

Internationality of the publisher

International

Language

English

International co-publication

Co-publication with a company

The publication is included in the Ministry of Education and Culture’s Publication data collection

Yes