• العربية
    • English
  • English 
    • العربية
    • English
  • Login
Home
Publisher PoliciesTerms of InterestHelp Videos
Submit Thesis
IntroductionIUGSpace Policies
JavaScript is disabled for your browser. Some features of this site may not work without it.
View Item 
  •   Home
  • Faculty of Information Technology
  • PhD and MSc Theses- Faculty of Information Technology
  • View Item
  •   Home
  • Faculty of Information Technology
  • PhD and MSc Theses- Faculty of Information Technology
  • View Item

Please use this identifier to cite or link to this item:

http://hdl.handle.net/20.500.12358/20061
TitleTag Recommendation for Short Arabic Text by Using Latent Semantic Analysis of Wikipedia
Title in Arabicاقتراح أوسمة للنصوص العربية القصيرة باستخدام تحليل الدلالات الكامنة على الويكيبيديا العربية
Abstract

Social media sites enable users to share items, such as texts and images, and annotate them with freely chosen keywords called tags. However, freedom comes at a cost: uncontrolled vocabulary can result in tag redundancy, ambiguity, sparsity, miss-spilling, and idiosyncrasy, thus impeding more effective organization/retrieval of resources in tagging systems. This work proposes an Arabic Language tag recommender system that exploits the Arabic Wikipedia as background knowledge. Latent semantic analysis was employed to discover hidden semantics between the short text and Wikipedia articles. Apache Spark was used to handle the massive content of Wikipedia and the complex computations of latent semantic analysis which is used to analyze Wikipedia articles into three matrices. Given an Arabic short text as input, the system compares it to the body of the articles and scores them according to their relevance to the short text. Candidate tags are determined from top-scored articles by exploiting articles' titles and categories. The proposed system was assessed over a dataset of 100 tweets covering three different domains. Generated tags were rated by two human experts in each domain. Our system achieved 84.39% mean average precision and 96.53% mean reciprocal rank, revealing the system adequacy and accuracy for tagging Arabic short texts while still has difficulties regarding Arabic language, and affected by frequencies of rare terms. A thorough analysis and discussion of the evaluation results are also presented to address the limitations and strengths as well as the recommendations for future improvements. Keywords: Short text, tag recommender, Arabic Language, Wikipedia, Latent Semantic Analysis, Spark

Authors
Samra, Yousef K. Abu
Supervisors
Alagha, Iyad M.
Typeرسالة ماجستير
Date2017
LanguageEnglish
Publisherالجامعة الإسلامية - غزة
Citation
License
Collections
  • PhD and MSc Theses- Faculty of Information Technology [124]
Files in this item
file_1.pdf2.052Mb
Thumbnail

The institutional repository of the Islamic University of Gaza was established as part of the ROMOR project that has been co-funded with support from the European Commission under the ERASMUS + European programme. This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Contact Us | Send Feedback
 

 

Browse

All of IUGSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsSupervisors

My Account

LoginRegister

Statistics

View Usage Statistics

The institutional repository of the Islamic University of Gaza was established as part of the ROMOR project that has been co-funded with support from the European Commission under the ERASMUS + European programme. This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Contact Us | Send Feedback