Please use this identifier to cite or link to this item:
http://hdl.handle.net/20.500.12358/25161
Title | Alignment of comparable documents: Comparison of similarity measures on French–English–Arabic data |
---|---|
Untitled | |
Abstract |
The objective, in this article, is to address the issue of the comparability of documents, which are extracted from different sources and written in different languages. These documents are not necessarily translations of each other. This material is referred as multilingual comparable corpora. These language resources are useful for multilingual natural language processing applications, especially for low-resourced language pairs. In this paper, we collect different data in Arabic, English, and French. Two corpora are built by using available hyperlinks for Wikipedia and Euronews. Euronews is an aligned multilingual (Arabic, English, and French) corpus of 34k documents collected from Euronews website. A more challenging issue is to build comparable corpus from two different and independent media having two distinct editorial lines, such as British Broadcasting Corporation (BBC) and Al Jazeera (JSC). To build … |
Type | Journal Article |
Date | 2018 |
Published in | Natural Language Engineering |
Series | Volume: 1, Number: 1 |
Publisher | Cambridge University Press |
Citation | |
Item link | Item Link |
License | ![]() |
Collections | |
Files in this item | ||
---|---|---|
Saad, Motaz K_16.pdf | 338.3Kb |