Please use this identifier to cite or link to this item:
|Title||An Approach for Detecting Spam in Arabic Opinion Reviews|
For the rapidly increasing amount of information available on the Internet, there exists the only little quality control, especially over the user-generated content (opinion reviews, Internet forums, discussion groups, and blogs). Manually scanning through large amounts of user-generated content is time-consuming and sometime impossible. In this case, opinion mining is a better alternative. Although, it is recognized that the opinion reviews contain valuable information for a variety of applications, the lack of quality control attracts spammers who have found many ways to draw their benefits from spamming. Moreover, the spam detection problem is complex because spammers always invent new methods that can't be recognized easily. Therefore, there is a need to develop a new approach that works to identify spam in opinion reviews. We have some in English; we need one in Arabic language in order to identify Arabic spam reviews. To the best of our knowledge, there is still no published study to detect spam in Arabic reviews because it has a very complex morphology compared to English. In this research, we propose a new approach for performing spam detection in Arabic opinion reviews by merging methods from data mining and text mining in one mining classification approach. Our work is based on the state-of-the-art achievements in the Latin-based spam detection techniques keeping in mind the specific nature of the Arabic language. In addition, we overcome the drawbacks of the class imbalance problem by using sampling techniques. Our approach is implemented using RapidMiner; an open-source machine learning tool and exploits machine learning methods to identify spam in Arabic opinion reviews. The experimental results show that the proposed approach is effective in identifying Arabic spam opinion reviews. Our designed machine learning achieves significant improvements. In the best case, our F-measure is improved up to 99.59%. We compared our approach with other approaches, and we found that our approach achieves best F-measure results in most cases.
|Publisher||الجامعة الإسلامية - غزة|
|Files in this item|