Improving Scalable K-Means++
Year of publication
2021
Authors
Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo
Abstract
Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art by improving clustering accuracy and the speed of convergence. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases
Show moreOrganizations and authors
Publication type
Publication format
Article
Parent publication type
Journal
Article type
Original article
Audience
ScientificPeer-reviewed
Peer-ReviewedMINEDU's publication type classification code
A1 Journal article (refereed), original researchPublication channel information
Journal/Series
Publisher
Volume
14
Issue
1
Article number
6
ISSN
Publication forum
Publication forum level
1
Open access
Open access in the publisher’s service
Yes
Open access of publication channel
Fully open publication channel
Self-archived
Yes
Article processing fee (EUR)
829
Year of payment for the open publication fee
2020
Other information
Fields of science
Computer and information sciences
Keywords
[object Object],[object Object],[object Object],[object Object]
Publication country
Switzerland
Internationality of the publisher
International
Language
English
International co-publication
No
Co-publication with a company
No
DOI
10.3390/a14010006
The publication is included in the Ministry of Education and Culture’s Publication data collection
Yes