Please use this identifier to cite or link to this item:
|Title||Author Attribution from Arabic Texts|
|Title in Arabic||تحديد هوية المؤلف من النصوص العربية|
Author attribution is the problem of assigning author to an unknown text. We propose a new approach to solve such a problem, by using an enhanced language model, our model is an enhanced version of the probabilistic context free language model (PCFG), by supplying it more syntactic, and lexical information. So that behind the probabilities for the production rules generated from PCFG, we add probabilities for terminals, non-terminals, and punctuation marks. Also the new language model is augmented with a scoring function, which assigns a score for each production rule. Since the new model contains different features, weights are added to the model to govern how each feature participates in classification. The advantage of using many features is to successfully capturing the different writing styles for authors, also using a scoring function can help by identifying the most discriminate rules, and ignoring the general rules that can affect the performance. Using weights supports capturing different authors’ styles, and setting weights properly can increase classifier’s performance. The new model is tested over 9 authors, each has 20 Arabic documents, where the training and testing is done using Leave-One-Out method. The model achieves 95% of accuracy, which is an enhancement of 3.5% over PCFG. While searching for best weights is implemented using Genetic algorithm over a new corpus of 10 documents per author, this increase the accuracy to 96%.
|Publisher||الجامعة الإسلامية - غزة|
|Files in this item|