How Fast Is the k-Means Method?




Sariel Har-Peled and Bardia Sadri

In this paper, we present polynomial upper and lower bounds on the number of iterations performed by the k-means method (a.k.a. Lloyd's method). k-means method is a widely popular heuristic method for k-means clustering. The popularity of the method is due to its simplicity and ease of implementation. Our upper bounds on the number of rounds of the k-means are polynomial in the number of points, number of clusters, and the spread of the point set. We also present a lower bound, showing that in the worst case the k-means heuristic needs to perform Ω(n) iterations, for n points on the real line and two centers. Surprisingly, the spread of the point set in this construction is polynomial. This is the first construction showing that the k-means heuristic requires more than a polylogarithmic number of iterations. Furthermore, we present two alternative algorithms, with guaranteed performance, which are simple variants of the k-means method. Results of our experimental studies on these algorithms are also presented.

ACM-SIAM Symposium on Discrete Mathematics (SODA), 2005 [PDF]
Journal version in Algorithmica (2005) 41:185-202 [PDF]