AuTopEx: Automated Topic Extraction Techniques Applied in the Software Engineering Domain
Automatically extracting topics from scientific papers can be very beneficial when a researcher needs to classify a large number of such papers. In this thesis we develop and evaluate an approach for Automatic Topic Extraction, Au- TopEx. The approach is comprised of four parts: 1) Text pre-processing. 2) Training a Latent Dirichlet Allocation model on part of a corpus. 3) Manually identifying relevant topics from the model. 4) Querying the model using the rest of the corpus. We show that it is possible to automatically extract topics by applying AuTopEx on a corpus of scientific papers on autonomous vehicles. According to our evaluation AuTopEx works better on full-text articles than texts consisting of just title, abstract and key-words. Finally we show that this approach is vastly faster than human annotators, although not as accurate.