• العربية
    • English
  • English 
    • العربية
    • English
  • Login
Home
Publisher PoliciesTerms of InterestHelp Videos
Submit Thesis
IntroductionIUGSpace Policies
JavaScript is disabled for your browser. Some features of this site may not work without it.
View Item 
  •   Home
  • Faculty of Information Technology
  • PhD and MSc Theses- Faculty of Information Technology
  • View Item
  •   Home
  • Faculty of Information Technology
  • PhD and MSc Theses- Faculty of Information Technology
  • View Item

Please use this identifier to cite or link to this item:

http://hdl.handle.net/20.500.12358/20142
TitleA High Performance Parallel Classifier for Large-Scale Arabic Text
Untitled
Abstract

Text classification has become one of the most important techniques in text mining. It is the process of classifying documents into predefined categories or classes based on their content. A number of machine learning algorithms have been introduced to deal with automatic text classification. One of the common classification algorithms is the k-Nearest Neighbor (k-NN) which is known to be one of the best classifiers applied for different languages including Arabic language and it is included in numerous experiments as a basis for comparison. Furthermore, it is a simple classification algorithm and very easy to implement since it does not require a training phase that most classification algorithms must have. However, the k-NN algorithm is of low efficiency because it requires a large amount of computational power for evaluating a measure of the similarity between a test document and every training document and for sorting the similarities. Such a drawback makes it unsuitable to handle a large volume of text documents with high dimensionality and in particular in the Arabic language. In our research, we propose to develop a parallel classifier for large-scale Arabic text that achieves the enhanced level of speedup, scalability, and accuracy. The proposed parallel classifier is based on the sequential k-NN algorithm. We test the parallel classifier using the Open Source Arabic Corpus (OSAC) which is the largest freely public Arabic corpus of text documents. We study the performance of the parallel classifier on a multicomputer cluster that consists of 14 computers. We report both timing and classification results. These results indicate that the proposed parallel classifier has very good speedup and scalability and is capable of handling large documents collections. Also, classification results show that the proposed classifier has achieved accuracy, precision, recall, and F-measure with higher than 95%.

Authors
Abu Tair, Mohammad M.
Supervisors
Baraka, Rebhi S.
Typeرسالة ماجستير
Date2013
LanguageEnglish
Publisherالجامعة الإسلامية - غزة
Citation
License
Collections
  • PhD and MSc Theses- Faculty of Information Technology [124]
Files in this item
file_1.pdf2.207Mb
Thumbnail

The institutional repository of the Islamic University of Gaza was established as part of the ROMOR project that has been co-funded with support from the European Commission under the ERASMUS + European programme. This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Contact Us | Send Feedback
 

 

Browse

All of IUGSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsSupervisors

My Account

LoginRegister

Statistics

View Usage Statistics

The institutional repository of the Islamic University of Gaza was established as part of the ROMOR project that has been co-funded with support from the European Commission under the ERASMUS + European programme. This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Contact Us | Send Feedback