Evaluation of Synthetic Ground Truth for Semantic Brain Image Segmentation - Developing a Database of Synthetic Images with Ground-Truth Segmentations for Practical Application Assessment
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Introduction: Automated brain image segmentation plays an important role in large-scale neuroimaging research and shows growing potential for clinical applications. Traditional segmentation pipelines rely on manually annotated MR-images as atlases, but such data are time-consuming to acquire and may not be readily available in sufficient quantity or variety. This thesis documents a research project where the use of synthetically generated MRimages with known ground truth was explored to support evaluation and benchmarking of automated segmentation methods. Aim & Purpose: The aim of this thesis is to evaluate the performance and robustness of the multi-atlas segmentation with enhanced registration (MAPER) algorithm using leave-one-out cross-validation, facilitated by the generation of synthetic MR images with controlled structural and intensity alterations. Additionally, this study assesses whether the synthetic images can be used as effective atlases compared to conventional MR-images and find a suitable benchmark for evaluation of different MAPER versions; onepad (master) and no-onepad. Method: Five types of synthetic MR-images were generated based on the IXI and Hammers datasets, each with different modes of spatial and intensity modification. Experiments were designed to (1) compare segmentation performance between synthetic atlases and unmodified conventional atlas (2) validate segmentation performance and robustness using leave-one-out cross-validation on synthetic targets, (3) quantify the convergence rate for the different ground truth benchmarks based on a parametric model and perform bootstrapping as a benchmarking measure to find the most sensitive benchmark. Then (4) apply this benchmark to compare two MAPER configurations. Results and conclusion: Segmentation performance was highest using conventional MRimages atlases, followed by statistical and scrambled synthetic variants. Leave-one-out cross-validation (LOOCV) confirmed similar trends. In benchmarking, the statistical smoothed image (statsmooth) type exhibited the fastest convergence rate, making it the most sensitive benchmark. Application of the benchmark revealed that the no-onepad version of MAPER outperformed the standard master configuration. Discussion: The results highlight that although synthetic images underperform compared to conventional data as atlases, they provide meaningful insights into algorithm robustness and segmentation behaviour. The ground truth benchmark shows potential as a controlled and reproducible framework for comparing segmentation methods and identifying subtle differences in performance.