undefined

Improving Scalable K-Means++

Year of publication

2021

Authors

Hämäläinen, Joonas; Kärkkäinen, Tommi; Rossi, Tuomo

Abstract

Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art by improving clustering accuracy and the speed of convergence. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases
Show more

Organizations and authors

University of Jyväskylä

Hämäläinen Joonas Orcid -palvelun logo

Kärkkäinen Tommi Orcid -palvelun logo

Rossi Tuomo Orcid -palvelun logo

Publication type

Publication format

Article

Parent publication type

Journal

Article type

Original article

Audience

Scientific

Peer-reviewed

Peer-Reviewed

MINEDU's publication type classification code

A1 Journal article (refereed), original research

Publication channel information

Journal/Series

Algorithms

Publisher

MDPI AG

Volume

14

Issue

1

Article number

6

​Publication forum

75024

​Publication forum level

1

Open access

Open access in the publisher’s service

Yes

Open access of publication channel

Fully open publication channel

Self-archived

Yes

Article processing fee (EUR)

829

Year of payment for the open publication fee

2020

Other information

Fields of science

Computer and information sciences

Keywords

[object Object],[object Object],[object Object],[object Object]

Publication country

Switzerland

Internationality of the publisher

International

Language

English

International co-publication

No

Co-publication with a company

No

DOI

10.3390/a14010006

The publication is included in the Ministry of Education and Culture’s Publication data collection

Yes