• العربية
    • English
  • English 
    • العربية
    • English
  • Login
Home
Publisher PoliciesTerms of InterestHelp Videos
Submit Thesis
IntroductionIUGSpace Policies
JavaScript is disabled for your browser. Some features of this site may not work without it.
View Item 
  •   Home
  • Faculty of Engineering
  • PhD and MSc Theses- Faculty of Engineering
  • View Item
  •   Home
  • Faculty of Engineering
  • PhD and MSc Theses- Faculty of Engineering
  • View Item

Please use this identifier to cite or link to this item:

http://hdl.handle.net/20.500.12358/18964
TitleImproving Arabic Light Stemming in Information Retrieval Systems
Untitled
Abstract

Information retrieval refers to the retrieval of textual documents such as newsprint and magazine articles or Web documents. Due to extensive research in the IR field, there are many retrieval techniques that have been developed for Arabic language. The main objective of this research to improve Arabic information retrieval by enhancing light stemming and preprocessing stage and to contribute to the open source community, also establish a guideline for Arabic normalization and stop-word removal. To achieve these objectives, we create a GUI toolkit that implements preprocessing stage that is necessary for information retrieval. One of these steps is normalizing, which we improved and introduced a set of rules to be standardized and improved by other researchers. The next preprocessing step we improved is stop-word removal, we introduced two different stop-word lists, the first one is intensive stop-word list for reducing the size of the index and ambiguous words, and the other is light stop-word list for better results with recall in information retrieval applications. We improved light stemming by update a suffix rule, and introduce the use of Arabized words, 100 words manually collected, these words should not follow the stemming rules since they came to Arabic language from other languages, and show how this improve results compared to two popular stemming algorithms like Khoja and Larkey stemmers. The proposed toolkit was integrated into a popular IR platform known as Terrier IR platform. We implemented Arabic language support into the Terrier IR platform. We used TF-IDF scoring model from Terrier IR platform. We tested our results using OSAC datasets. We used java programming language and Terrier IR platform for the proposed systems. The infrastructure we used consisted of CORE I7 CPU ran speed at 3.4 GHZ and 8 GB RAM.

Authors
Almusaddar, Mohammed Yahya
Supervisors
Abuhiba, Ibrahim S.I.
Typeرسالة ماجستير
Date2014
LanguageEnglish
Publisherالجامعة الإسلامية - غزة
Citation
License
Collections
  • PhD and MSc Theses- Faculty of Engineering [641]
Files in this item
file_1.pdf1.859Mb
Thumbnail

The institutional repository of the Islamic University of Gaza was established as part of the ROMOR project that has been co-funded with support from the European Commission under the ERASMUS + European programme. This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Contact Us | Send Feedback
 

 

Browse

All of IUGSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsSupervisors

My Account

LoginRegister

Statistics

View Usage Statistics

The institutional repository of the Islamic University of Gaza was established as part of the ROMOR project that has been co-funded with support from the European Commission under the ERASMUS + European programme. This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Contact Us | Send Feedback