Please use this identifier to cite or link to this item:
|Title||Efficient data clustering algorithms: Improvements over Kmeans|
This paper presents a new approach to overcome one of the most known disadvantages of the well-known Kmeans clustering algorithm. The problems of classical Kmeans are such as the problem of random initialization of prototypes and the requirement of predefined number of clusters in the dataset. Randomly initialized prototypes can often yield results to converge to local rather than global optimum. A better result of Kmeans may be obtained by running it many times to get satisfactory results. The proposed algorithms are based on a new novel definition of densities of data points which is based on the k-nearest neighbor method. By this definition we detect noise and outliers which affect Kmeans strongly, and obtained good initial prototypes from one run with automatic determination of K number of clusters. This algorithm is referred to as Efficient Initialization of Kmeans (EI-Kmeans). Still Kmeans algorithm used to cluster data with convex shapes, similar sizes, and densities. Thus we develop a new clustering algorithm called Efficient Data Clustering Algorithm (EDCA) that uses our new definition of densities of data points. The results show that the proposed algorithms improve the data clustering by Kmeans. EDCA is able to detect clusters with different non-convex shapes, different sizes and densities.
|Published in||International Journal of Intelligent Systems and Applications|
|Series||Volume: 5, Number: 3|
|Publisher||Modern Education and Computer Science Press|
|Item link||Item Link|
|Files in this item|