From Abstract Syntax to Natural Language Addressing Natural Language Generation Challenges in Arabic Using GFWordnet as Lexical Resources.
From Abstract Syntax to Natural Language Addressing Natural Language Generation Challenges in Arabic Using GFWordnet as Lexical Resources.
Abstract
This thesis explores the development and evaluation of Arabic natural language generation
using the Grammatical Framework (GF) within GFPedia. GFPedia is a framework
that generates multilingual content using predefined abstract syntax trees (ASTs) and
dynamic placeholders for lexical entries from GFWordNet. The primary goal is to assess
how effectively GF can generate grammatically correct sentences based on the available
abstract syntax trees (ASTs) in the GFPedia. The research involves building Arabic
lexical resources and integrating them into GFPedia.
The system’s output is evaluated (a) automatically, using Levenshtein distance to
measure deviations from reference texts and (b) manually by analyzing the grammatical
and morphological correctness. Results highlight significant challenges in Arabic sentence
generation, including issues with word structure, definiteness, syntactic alignment,
and the need for context-aware translations.
To address these challenges, the thesis proposes the introduction of a semantic layer
into the GFPedia framework. By leveraging ontological and contextual information
from resources like Wikidata, the semantic layer can select appropriate words, word
order, sentence types, and other linguistic features based on the semantic content of the
information. This approach aims to reduce the dependency on deep knowledge of the
Resource Grammar Library (RGL) and language-specific grammar, facilitating a more
efficient and scalable content development process. Additionally, the thesis suggests
using Large Language Models (LLMs) to assist in generating lexical resources using
Retrieval-Augmented Generation (RAG).
Degree
Student essay
Collections
View/ Open
Date
2024-11-28Author
Zarzoura, Mohamed
Keywords
Language technology, Natural language generation, NLG
Language
eng