• العربية
    • English
  • English 
    • العربية
    • English
  • Login
Home
Publisher PoliciesTerms of InterestHelp Videos
Submit Thesis
IntroductionIUGSpace Policies
JavaScript is disabled for your browser. Some features of this site may not work without it.
View Item 
  •   Home
  • Faculty of Information Technology
  • PhD and MSc Theses- Faculty of Information Technology
  • View Item
  •   Home
  • Faculty of Information Technology
  • PhD and MSc Theses- Faculty of Information Technology
  • View Item

Please use this identifier to cite or link to this item:

http://hdl.handle.net/20.500.12358/20138
TitleAutomatic Arabic Domain-Relevant Term Extraction
Untitled
Abstract

Term extraction from text corpus is an important step in knowledge acquisition and it is the first step in many Natural Language Processing (NLP) methods and computer lingual systems. In Arabic language there are some works in the field of term extraction and few of them try to extract domain-relevant terms. In this research a model for automatic Arabic domain-relevant term extraction from text corpus was proposed. The proposed model uses a hybrid approach composed of linguistic and statistical methods to extract terms relevant to specific domains depending on prevalence and tendency term ranking mechanism. In order to realize the proposed model a multi domain corpus separated into 10 domains (Economic, History, Education and family, Religious and Fatwa's, Sport, Health, Astronomy, Low, Stories, and Cooking recipes) was used. Then this corpus preprocessed by removing non Arabic letters, punctuations, diacritics, and stop words. Then a candidate terms vector was extracted using a sliding window with variant length dropping the windows that contain a stop word. Candidate terms have been ranked using Termhood method as a statistical method that measures the distributional behavior of candidate terms within the domain and across the rest of the corpus. Then Candidate terms have been distributed over the domains depending on the higher rank result for the extracted terms constructing a domain term matrix. This matrix has been used in a simple classifier that classifies the testing corpus. The final step gives us a confusion matrix that indicates that the domain term matrix worked as a best classifier achieving an accuracy rate of 100% for some domains and very good in others. The total accuracy of the classifier was 95%. This is a highly accurate classifier.

Authors
Fayyad, Manar Saed Abdel-mohsen
Supervisors
Baraka, Rebhi Soliman
Typeرسالة ماجستير
Date2012
LanguageEnglish
Publisherthe islamic university
Citation
License
Collections
  • PhD and MSc Theses- Faculty of Information Technology [124]
Files in this item
file_1.pdf3.231Mb
Thumbnail

The institutional repository of the Islamic University of Gaza was established as part of the ROMOR project that has been co-funded with support from the European Commission under the ERASMUS + European programme. This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Contact Us | Send Feedback
 

 

Browse

All of IUGSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsSupervisors

My Account

LoginRegister

Statistics

View Usage Statistics

The institutional repository of the Islamic University of Gaza was established as part of the ROMOR project that has been co-funded with support from the European Commission under the ERASMUS + European programme. This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Contact Us | Send Feedback