Please use this identifier to cite or link to this item:
|Title||Spoken Arabic News Classification Based on Speech Features|
One of the most important consequences of what is known as the "Internet era" is the widespread of varied electronic data. This deployment urgently requires an automated system to classify these data to facilitate search and access to the topic in question. This system is commonly used in written texts. Because of the huge increase of spoken files nowadays, there is an acute need for building an automatic system to classify spoken files based on topics. This system has been discussed in the previous researches applied to spoken English texts, but it rarely takes into consideration spoken Arabic texts because Arabic language is challenging and its dataset is rare and not suitable for topic classification. To deal with this challenge, a new dataset is established depending on converting the common written text (ALJ-NEWS) which is widely used in researches in classifying written texts. Then, keywords extraction method is implemented in order to extract the keywords representing each class depending on using DTW. Finally, topic identification, based on (MFCC, PLP-RASTA) as speech features and (DTW, HMM) as identifiers, is created using a technique that is different from the traditional way, using ASR to extract the transcriptions. Regarding the evaluation of the system, F1-measure, precision and recall are used as evaluation metrics. The proposed system shows positive results in the topic classification field. The F1-measure for topic identification system using DTW classifier records 90.26% and 91.36% using HMM classifier in the average. In addition, the system achieves 89.65% of keywords identification accuracy.
|Published in||International Journal for Research in Applied Science and Engineering Technology (IJRASET)|
|Series||Volume: 5, Number: VIII|
|Publisher||International Journal for Research in Applied Science and Engineering Technology, ISSN : 2321-9653|
|Item link||Item Link|
|Files in this item|