Please use this identifier to cite or link to this item:
|Title||Building an Effective Stemmer for Arabic Language to Improve Search Effectiveness|
Creating good stemming rules for the Arabic language comes from the importance of Arabic language as the sixth most used language in the world. Stemming is very important in information Retrieval, data mining, language processing. Many linguistic and light stemmers have been developed for Arabic language but still there are many weakness and problem. This thesis proposes an efficient stemming algorithm that developed to solve the problems with several stemming approaches like ambiguity, broken plural problems, irregular words and confusion between nouns and verbs, a proposed stemming algorithm uses two stemming approaches, the root stemming for verbs and the light stemming for nouns. The proposed algorithm will depend on separation between nouns and verbs by adding classification rules and addresses every part of words by special strategy, to increase efficiency of stemming words. Such algorithm will contribute to enhanced efficiency and speed of information retrieval and search engines, By using these rules, it can solve the ambiguity of words. A new Arabic stemmer has been developed using Java Programming Language with JDK 1.6 and applied this stemmer on WEKA (Waikato Environment for Knowledge Analysis) for text preprocessing and classification and which it suitable environment for most stemmers to evaluation, it allows user to load any data set, choose from any included stemmers, select any included classifier and explorer the classification results like recall and precision ...etc. WEKA used to test the proposed stemmer and compare it with other stemmers like Khoja stemmer and Light10 stemmer, the researcher compared the proposed stemmer with Khoja and Light10 by using OSAC (Open Source Arabic Corpus) and CNN (Cable News Network) corpus show that the proposed stemmer increase accuracy of text classification to an average of 90.1% which is better than using Light10 and Khoja which achieve an average accuracy of 88.2% and 85.17% respectively.
|Publisher||الجامعة الإسلامية - غزة|
|Files in this item|