Logical properties of Natural Language Inference - Experiments with Synthetic Data to Study Consequence Relations in LSTMs
Logical properties of Natural Language Inference - Experiments with Synthetic Data to Study Consequence Relations in LSTMs
Abstract
Natural language inference (NLI) datasets are great resources to train and benchmark
models that infer entailment relations. However, these datasets are known to have issues
such as lexical biases that affect the behaviour of the models trained on them. In this
thesis, we take on this task from an experimentation point of view. We study consequence
relations and how data augmentation affects the performance of NLI models. We started
by defining the model, which uses a simple LSTM consisting of an embedding layer, and
defined three scenarios upon which we synthesize entailment in a controlled manner
from within SNLI corpus. We trained various models and compared the performance
using the f1-score for entailment and overall accuracy to show how adding synthetic
data provides a middle-ground to have balanced performance, particularly for different
consequence relations. We found that under the scenarios we defined, self-entailment
decreases the f1-score marginally compared to the original data when tested on the
baseline model. This is followed by conjunction scenario where the premise is augmented
with its hypothesis, and finally, where the hypothesis is augmented with the premise.
We conclude by recommending proportions of synthetic data that should be added to
these models to make them better at inferring different logical consequence relations.
Degree
Student essay
Collections
View/ Open
Date
2024-06-17Author
Monteiro, Hélder
Keywords
Language Technology
Language
eng