Please use this identifier to cite or link to this item:
|Title||Efficient Data Clustering Algorithms|
Data Clustering is one of the most important issues in data mining and machine learning. Clustering is a task of discovering homogenous groups of the studied objects. Recently, many researchers have a significant interest in developing clustering algorithms. The most problem in clustering is that we do not have prior information knowledge about the given dataset. Moreover, the choice of input parameters such as the number of clusters, number of nearest neighbors and other factors in these algorithms make the clustering more challengeable topic. Thus any incorrect choice of these parameters yields bad clustering results. Furthermore, these algorithms suffer from unsatisfactory accuracy when the dataset contains clusters with different complex shapes, densities, sizes, noise and outliers. In this thesis, we propose a new approach for unsupervised clustering task. Our approach consists of three phases of operations. In the first phase we use the most widely used clustering technique which is Kmeans algorithm for its simplicity and speed in practice. We benefit just from one run of Kmeans, despites its accuracy, to discover and analyze the given dataset by catching preliminary clusters to insure closely grouping sets. The second phase takes these initial groups for processing them in a parallel fashion using shrinking based on the convex hull of the initial groups. From the second phase we obtain a set of sub-clusters of the given dataset. Hence, the third phase considers these sub-clusters for merging process based on the Delaunay triangulation. This new algorithm is named as Kmeans-Based Convex Hull Triangulation clustering algorithm (KBCHT). We present experiments that provide the strength of our new algorithm in discovering clusters with different non-convex shapes, sizes, densities, noise and outliers even though the bad initial conditions used in its first phase. These experiments show the superiority of our proposed algorithm when comparing with most competing algorithms.
|Publisher||the islamic university|
|Files in this item|