• العربية
    • English
  • English 
    • العربية
    • English
  • Login
Home
Publisher PoliciesTerms of InterestHelp Videos
Submit Thesis
IntroductionIUGSpace Policies
JavaScript is disabled for your browser. Some features of this site may not work without it.
View Item 
  •   Home
  • Faculty of Engineering
  • PhD and MSc Theses- Faculty of Engineering
  • View Item
  •   Home
  • Faculty of Engineering
  • PhD and MSc Theses- Faculty of Engineering
  • View Item

Please use this identifier to cite or link to this item:

http://hdl.handle.net/20.500.12358/18966
TitleEvaluating the Effect of Preprocessing in Arabic Documents Clustering
Untitled
Abstract

Clustering of text documents is an important technique for documents retrieval. It aims to organize documents into meaningful groups or clusters. Preprocessing text plays a main role in enhancing clustering process of Arabic documents. This research examines and compares text preprocessing techniques in Arabic document clustering. It also studies effectiveness of text preprocessing techniques: term pruning, term weighting using (TF-IDF), morphological analysis techniques using (root-based stemming, light stemming, and raw text), and normalization. Experimental work examined the effect of clustering algorithms using a most widely used partitional algorithm, K-means, compared with other clustering partitional algorithm, Expectation Maximization (EM) algorithm. Comparison between the effect of both Euclidean Distance and Manhattan similarity measurement function was attempted in order to produce best results in document clustering. Results were investigated by measuring evaluation of clustered documents in many cases of preprocessing techniques. The most frequent and basic measures for text mining evaluation, precision and recall, were used for evaluation measurements. In addition to F-Measure, which used as a combination of precision and recall. Experimental results show that evaluation of document clustering can be enhanced by implementing term weighting (TF-IDF) and term pruning with small value for minimum term frequency. In morphological analysis, light stemming, is found more appropriate than root-based stemming and raw text. Normalization, also improved clustering process of Arabic documents, and evaluation is enhanced. Finally, K-means in document clustering was found more efficient than EM algorithm, and Euclidean distance similarity measurement function is superior.

Authors
Ghanem, Osama Abedl Fattah
Supervisors
Alhanjouri, Mohammed
Typeرسالة ماجستير
Date2014
LanguageEnglish
Publisherالجامعة الإسلامية - غزة
Citation
License
Collections
  • PhD and MSc Theses- Faculty of Engineering [641]
Files in this item
file_1.pdf3.286Mb
Thumbnail

The institutional repository of the Islamic University of Gaza was established as part of the ROMOR project that has been co-funded with support from the European Commission under the ERASMUS + European programme. This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Contact Us | Send Feedback
 

 

Browse

All of IUGSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsSupervisors

My Account

LoginRegister

Statistics

View Usage Statistics

The institutional repository of the Islamic University of Gaza was established as part of the ROMOR project that has been co-funded with support from the European Commission under the ERASMUS + European programme. This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Contact Us | Send Feedback