Please use this identifier to cite or link to this item:
http://hdl.handle.net/20.500.12358/18770
Title | The Impact of Text Preprocessing and Term Weighting on Arabic Text Classification |
---|---|
Untitled | |
Abstract |
This research presents and compares the impact of text preprocessing, which has not been addressed before, on Arabic text classification using popular text classification algorithms; Decision Tree, K Nearest Neighbors, Support Vector Machines, Naïve Bayes and its variations. Text preprocessing includes applying different term weighting schemes, and Arabic morphological analysis (stemming and light stemming). We implemented and integrated Arabic morphological analysis tools within the leading open source machine learning tools: Weka, and RapidMiner. Text Classification algorithms are applied on seven Arabic corpora (3 in-house collected and 4 existing corpora). Experimental results show: (1) Light stemming with term pruning is best feature reduction technique. (2) Support Vector Machines and Naïve Bayes variations outperform other algorithms. (3) Weighting schemes impact the performance of distance based classifier. |
Authors | |
Supervisors | |
Type | رسالة ماجستير |
Date | 2010 |
Language | English |
Publisher | the islamic university |
Citation | |
License | ![]() |
Collections | |
Files in this item | ||
---|---|---|
file_1.pdf | 3.290Mb |