From Abstract Syntax to Natural Language Addressing Natural Language Generation Challenges in Arabic Using GFWordnet as Lexical Resources.
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis explores the development and evaluation of Arabic natural language generation using the Grammatical Framework (GF) within GFPedia. GFPedia is a framework that generates multilingual content using predefined abstract syntax trees (ASTs) and dynamic placeholders for lexical entries from GFWordNet. The primary goal is to assess how effectively GF can generate grammatically correct sentences based on the available abstract syntax trees (ASTs) in the GFPedia. The research involves building Arabic lexical resources and integrating them into GFPedia. The system’s output is evaluated (a) automatically, using Levenshtein distance to measure deviations from reference texts and (b) manually by analyzing the grammatical and morphological correctness. Results highlight significant challenges in Arabic sentence generation, including issues with word structure, definiteness, syntactic alignment, and the need for context-aware translations. To address these challenges, the thesis proposes the introduction of a semantic layer into the GFPedia framework. By leveraging ontological and contextual information from resources like Wikidata, the semantic layer can select appropriate words, word order, sentence types, and other linguistic features based on the semantic content of the information. This approach aims to reduce the dependency on deep knowledge of the Resource Grammar Library (RGL) and language-specific grammar, facilitating a more efficient and scalable content development process. Additionally, the thesis suggests using Large Language Models (LLMs) to assist in generating lexical resources using Retrieval-Augmented Generation (RAG).