On the Number of Steps of Lloyds k-Means Method

Bardia Sadri

This thesis presents polynomial upper and lower bounds on the number of iterations performed by Lloyd's method for k-means clustering.  The upper bounds are polynomial in the number of points, number of clusters, and the spread of the point set. The presented lower bound shows that in the worst case the k-means heuristic needs to perform Ω(n) iterations, for n points on the real line and two centers.  Surprisingly, the spread of this lower bound construction is polynomial. This is the first construction showing that the k-means heuristic requires more than a poly-logarithmic number of iterations. Furthermore, two alternative algorithms with guaranteed performances are presented, which are simple variants of Lloyd's method. Results of experimental studies on these algorithms are also presented.

Masters thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, Sariel Har-Peled, Adviser, 2004 [PDF]