Please use this identifier to cite or link to this item:
|Title||A Comparative Study on Serial Decision Tree Classification Algorithms in Text Mining|
Text mining refers to the process of deriving high quality information from text. It is used in search engine, digital libraries, fraud detection, and other applications that handles text data. Text mining tasks include text classification, text clustering, entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling. Classification of objects into pre-defined categories based on their features is a widely studied problem. It aims to employ labeled training data set to build a classification model based on other attributes, such that the model can be used to classify new data according to their class labels. The decision tree-based classification is one of the most practical and effective methods that uses inductive learning. It is implemented serially or in parallel, depending on data set size. Some of the classifiers such as SLIQ, SPRINT and Rainforest can be implemented serially or parallel. ID 3, CART and C4. 5 are serial classifiers. In this paper, we review various decision tree algorithms with their limitations, and conduct a comparative study to evaluate their performance regarding accuracy, learning time and tree size, using four sample datasets. We found out that Random Forest classifier is the most accurate one among other classifiers. However, the increase of the dataset size and its attributes, the more the learning time and tree size, and vise versa.
|Published in||International Journal of Intelligent Computing Research (IJICR)|
|Series||Volume: 7, Number: 4|
|Item link||Item Link|
|Files in this item|
|Maghari, Ashraf Y. A._13.pdf||924.5Kb|