Modeling The Impact Of AI On Software
Development: An Automotive Case Study
Master’s Thesis in Computer science and engineering
Adam Magnus
Department of Computer Science and Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
UNIVERSITY OF GOTHENBURG
Gothenburg, Sweden 2025

Master’s Thesis 2025
Modeling The Impact Of AI On Software
Development: An Automotive Case Study
Adam Magnus
Department of Computer Science and Engineering
Chalmers University of Technology
University of Gothenburg
Gothenburg, Sweden 2025
Modeling The Impact Of AI On Software Development: An Automotive Case Study
Adam Magnus
© Adam Magnus, 2025.
Supervisor: Yinan Yu, Department of Computer Science and Engineering
Industrial Supervisor: Dhasarathy Parthasarathy, Volvo Trucks
Examiner: Hans-Martin Heyn, Department of Computer Science and Engineering
Master’s Thesis 2025
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
SE-412 96 Gothenburg
Telephone +46 31 772 1000
Typeset in LATEX
Gothenburg, Sweden 2025
iv
Adam Magnus
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
Abstract
As the trends of integrating artificial intelligence (AI) into software development
continue to increase, assessing its impact is crucial, especially in complex, safety-
critical domains such as the automotive industry. This study investigates the impact
of AI in software development processes through a case study involving two real-
world AI-powered solutions at Volvo Trucks: CS-testing and API-testing tools.
This study proposes a structured five-phase framework to model the impact of AI
from stakeholder-defined perspectives, employing a mixed-methods approach that
combines quantitative and qualitative methods. These include interviews, surveys,
and various process analyses. The framework focuses on evaluating factors involving
quality, efficiency, automation, and stakeholder alignment to categorize and priori-
tize metrics and methods.
The evaluation shows that AI-driven tools significantly improve two software testing
processes by increasing efficiency and quality, addressing prioritized stakeholder pain
points, and preserving existing strengths. Moreover, the solutions deliver measur-
able value and achieve a high automation level (Level 4), supported by a practical
decision tree that helps developers choose suitable automation methods and high-
lights their direct impact.
Furthermore, this study highlights how AI-driven solutions can facilitate testing
workflows and support stakeholders’ decision-making processes. The results show-
case a practical methodology for assessing AI’s impact and value in software devel-
opment, guiding organizations in determining whether to integrate AI solutions into
their processes.
Keywords: AI, Impact Assessment, Software Development, Software Engineering,
LLM, Software Testing, Process Improvement, Decision-making, Automation
v

Acknowledgements
I would like to express my heartfelt gratitude to all those who have supported me
during the course of this thesis.
First and foremost, I am sincerely grateful to my industrial supervisor at Volvo
Trucks, Dhasarathy Parthasarathy, for his guidance, practical insights, and contin-
uous support throughout the project. His real-world perspective played a crucial
role in shaping the direction of my work.
I would also like to extend my sincere appreciation to my academic supervisor,
Yinan Yu, for her expert advice, thoughtful feedback, and unwavering encourage-
ment. Her mentorship was instrumental in helping me navigate both the challenges
and milestones of this research.
I would like to thank my examiner, Hans-Martin Heyn, for his time, thoughtful eval-
uation, and constructive feedback, which helped enhance the quality and clarity of
this thesis.
To my colleagues and friends, thank you for your support, stimulating discussions,
and the occasional well-needed distractions. Your presence made this journey far
more enjoyable and manageable.
And most importantly, I am deeply indebted to my family for their unconditional
love, patience, and belief in me. Their support has been my greatest source of
strength throughout this endeavor.
Adam Magnus, Gothenburg, June 2025
vii

Contents
List of Figures xiii
List of Tables xv
1 Introduction 1
1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Significance of the Study . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 5
2.1 Generative AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 AI in Software Engineering . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Automotive Software Engineering . . . . . . . . . . . . . . . . . . . . 8
2.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 Control System Testing . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 API-Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.3 Polymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Impact Dimensions: Business and Technical . . . . . . . . . . . . . . 12
3 Related Work 13
3.1 Risk evaluation and automation taxonomy . . . . . . . . . . . . . . . 13
3.2 Impact on productivity and research gap . . . . . . . . . . . . . . . . 14
3.3 Economics of Software Engineering . . . . . . . . . . . . . . . . . . . 15
3.3.1 Return On Software Quality (ROSQ) . . . . . . . . . . . . . . 15
3.4 Software Process Improvement (SPI) and Return on Investment(ROI) 16
3.4.1 Capability Maturity Model (CMM) and Capability Maturity
Model Integration (CMMI) . . . . . . . . . . . . . . . . . . . . 16
3.4.2 Pareto Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4.3 Quality Improvement Initiatives . . . . . . . . . . . . . . . . . 17
3.5 Project Size Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5.1 Lines of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5.2 Function Point Analysis (FPA) . . . . . . . . . . . . . . . . . 18
3.5.3 COSMIC Function Point . . . . . . . . . . . . . . . . . . . . . 19
3.6 Value Stream Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Methods 21
ix
Contents
4.1 Research Design and Approach . . . . . . . . . . . . . . . . . . . . . 21
4.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.1 Intertwined quality and Efficiency . . . . . . . . . . . . . . . . 22
4.2.2 Surveys and Interviews . . . . . . . . . . . . . . . . . . . . . . 22
4.2.3 Factors and Issues Prioritization . . . . . . . . . . . . . . . . . 23
4.2.4 Software Process Improvement Initiatives (SPII) . . . . . . . . 23
4.2.5 Automation Levels and risk . . . . . . . . . . . . . . . . . . . 23
4.3 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Results 29
5.1 Interview and Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1.1 Round 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1.2 Round 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.1.3 Round 3, 4, and 5 . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Pareto Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2.1 CS-Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.2 API-Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3 Software Process Improvement Initiatives (SPII) . . . . . . . . . . . . 37
5.3.1 CS-Testing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3.2 API-Testing Tool . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3.3 Gap Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4 Automation Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.5 Automation Levels and Risk . . . . . . . . . . . . . . . . . . . . . . . 42
6 Discussion 43
6.1 Addressing Research Questions . . . . . . . . . . . . . . . . . . . . . 43
6.1.1 Research Question 1 . . . . . . . . . . . . . . . . . . . . . . . 43
6.1.2 Research Question 2 . . . . . . . . . . . . . . . . . . . . . . . 43
6.1.2.1 Significance Weighting Metrics . . . . . . . . . . . . 44
6.1.2.2 Factor-Specific Metrics . . . . . . . . . . . . . . . . . 44
6.1.2.3 Component-Specific Metrics . . . . . . . . . . . . . . 45
6.1.2.4 Classification Metrics . . . . . . . . . . . . . . . . . . 45
6.1.3 Research Question 3 . . . . . . . . . . . . . . . . . . . . . . . 45
6.1.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1.3.2 Tying It All Together . . . . . . . . . . . . . . . . . 47
6.1.4 Research Question 4 . . . . . . . . . . . . . . . . . . . . . . . 47
6.1.4.1 Pareto Analysis and Diagrams . . . . . . . . . . . . . 48
6.1.4.2 SPII . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.1.4.3 Automation Decision Tree . . . . . . . . . . . . . . . 48
6.1.4.4 Tying It All Together . . . . . . . . . . . . . . . . . 48
6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2.1 Result Compilation and Pre and Post Comparison . . . . . . . 49
6.2.2 Constraints, Limitations and Risk . . . . . . . . . . . . . . . . 49
6.2.3 Automatability Levels of the framework . . . . . . . . . . . . 49
6.2.4 Longitudinal Studies and Feedback Loops . . . . . . . . . . . 50
6.2.5 Expanding to other domains and focuses . . . . . . . . . . . . 50
x
Contents
7 Validity Threats and Limitations 51
7.1 Internal Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.1.1 Sampling Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.1.2 Research Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.1.3 Social Desirability . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.2 External Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8 Conclusion 53
Bibliography 55
A Appendix I
A.1 Non-utilized Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . I
A.2 Utilized Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV
A.3 Round 2 interview/survey questions . . . . . . . . . . . . . . . . . . . VII
xi
Contents
xii
List of Figures
2.1 Software Develoment Lifecycle [15][3] . . . . . . . . . . . . . . . . . . 6
2.2 Domain-centralized E/E system example [19] . . . . . . . . . . . . . . 9
2.3 General overview of the CS-Testing Process . . . . . . . . . . . . . . 10
2.4 General overview of the API-Testing process . . . . . . . . . . . . . . 10
2.5 Manual Process vs AI solution Process [30] . . . . . . . . . . . . . . . 11
3.1 Levels of risk on the AI-SEAL taxonomy with respect to Points of
Application and Levels of Automation [10] . . . . . . . . . . . . . . . 13
3.2 Levels of automation - DAnTE scale [18] . . . . . . . . . . . . . . . . 14
3.3 Function Points Measurement Model [1] . . . . . . . . . . . . . . . . 19
4.1 Research plan overview. . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Framework workflow overview . . . . . . . . . . . . . . . . . . . . . . 26
5.1 Pareto Chart - CS-Testing quality and efficiency factors . . . . . . . . 32
5.2 Pareto Chart - API-Testing quality and efficiency factors . . . . . . . 35
5.3 Main AI components of the CS-Testing tool and the API-Testing tool. 37
5.4 SPII vs Quality and efficiency factors. . . . . . . . . . . . . . . . . . . 38
5.5 Decision Tree Visualization . . . . . . . . . . . . . . . . . . . . . . . . 41
xiii
List of Figures
xiv
List of Tables
3.1 Function points measurement model . . . . . . . . . . . . . . . . . . 18
3.2 Function point weights as proposed by Albrecht (1983) [1] . . . . . . 19
3.3 Example of a function point following the Albrecht 83 version [1] . . . 20
5.1 Manual CS-Testing Teams and Roles . . . . . . . . . . . . . . . . . . 29
5.2 Manual API-Testing Teams and Roles . . . . . . . . . . . . . . . . . 30
5.3 CS-Testing Teams, Roles and Experience Levels . . . . . . . . . . . . 30
5.4 Number of factors identified per category - CS-Testing (Raw) . . . . . 30
5.5 Number of factors identified per category - API-Testing (Raw) . . . . 31
5.6 Legend for CS-Testing - High Efficiency Factors . . . . . . . . . . . . 32
5.7 Legend for CS-Testing - High Quality Factors . . . . . . . . . . . . . 33
5.8 Legend for CS-Testing - Low Efficiency Factors . . . . . . . . . . . . 33
5.9 Legend for CS-Testing - Low Quality Factors . . . . . . . . . . . . . . 33
5.10 Legend for API-Testing - High Efficiency Factors . . . . . . . . . . . 35
5.11 Legend for API-Testing - High Quality Factors . . . . . . . . . . . . . 35
5.12 Legend for API-Testing - Low Efficiency Factors . . . . . . . . . . . . 36
5.13 Legend for API-Testing - Low Quality Factors . . . . . . . . . . . . . 36
5.14 Factors considered when automating . . . . . . . . . . . . . . . . . . 40
A.1 Full non-utilized methods classification . . . . . . . . . . . . . . . . . II
A.2 Full utilized methods classification . . . . . . . . . . . . . . . . . . . . V
xv
List of Tables
xvi
1
Introduction
1.1 Problem Description
With the widespread adoption of Artificial Intelligence (AI), its ability to enhance
and optimize processes is ever more sought after, especially in the software devel-
opment industry [27]. However, with the constant evolution of AI and the growing
popularity of integrating AI, it is crucial to evaluate its impact critically and deter-
mine whether AI offers the best solution needed. There are numerous forms of AI,
with generative AI being one of the most trending and having the most significant
potential. This is due to its capability to generate content such as text and code
in an automated fashion, offering solutions in various industries. Integrating gen-
erative AI in the software development industry shows great potential in bringing
great value to companies by being a driver of automation and optimization for mul-
tiple processes that traditionally require immense manual effort, time, and money
[26]. However, it is critical to understand the actual value and risks of employing
AI-driven solutions, especially in highly complex and safety-critical sectors such as
the automotive industry. Regarding current research and industry practices, the
primary focus is on the capabilities of generative AI and ways to incorporate gener-
ative AI within different processes. However, there still lacks a system for evaluation
and modeling of the impact of AI within software development processes, as well as
a significant lack of critical analysis and evaluation of its necessity, efficiency, and
value. This is especially significant when it comes to the automotive industry, where
there exist unique and critical standards that must be employed.
At Volvo Trucks, multiple AI-driven solutions are being developed across many
different processes. One of those is the software development process, specifically
the testing process. With testing being arguably one of the most important aspects
of safety-critical industries, it is of immense importance to ensure high reliability
and accuracy. In this study, two projects were investigated, utilizing Large Language
Models (LLMs) at their core. Although AI might have a clear and apparent direct
impact on the testing process, it remains unclear what the impact is on the broader
software development process, as there will be financial and technical impact on
many different aspects such as costs, time, employees, quality, etc. This raises many
questions, such as whether AI is the best solution in specific scenarios, how AI fits
in the current workflows employed at the company, whether the gains outweigh the
1
1. Introduction
risks, and many more. The focus of this study is on AI in software engineering, and
it is explored and validated through an in-depth examination of the two software
testing processes at Volvo Trucks.
1.2 Purpose of the Study
The main purpose of this study is to develop and validate a structured framework for
modeling the impact of generative AI-driven solutions within software development,
with a specific focus on manual testing processes in the automotive domain, par-
ticularly control system and API testing at Volvo Trucks. This includes identifying
what constitutes “impact” in this context and exploring how it can be meaningfully
assessed and aligned with stakeholder needs.
This study aims to yield value-based results, which refer to outcomes or findings
that are evaluated and interpreted in terms of the value they deliver to stakehold-
ers, rather than just technical performance or isolated metrics. These results and
their implications can assist and support practitioners and stakeholders in their
decision-making processes, based on the conditions under which AI brings positive
net benefits.
In this case, practitioners refer to those involved in hands-on work, such as software
developers and AI specialists, who are engaged in the development and implementa-
tion of AI tools. Stakeholders, on the other hand, include individuals or groups that
have an interest or investment in these tools, such as end users, decision-makers, or
others impacted by their adoption and use, even if they are not directly involved in
the development process.
By grounding the research in the testing workflows at Volvo Trucks, the study
provides actionable insights for the broader software development industry. This
is achieved through methods, frameworks, and roadmaps that support planning,
forecasting, and evaluating the value of developing or integrating AI-based solutions
into software processes.
1.3 Research Questions
To achieve the purpose of the study, the study will focus on the following research
questions:
RQ1: How is "impact" defined in a software development context? How
can we model the impact of AI without knowing how the term "impact" is defined?
Before we can model it, we need to agree on what we are actually targeting.
RQ2: What impact metrics are applicable and how can these metrics be
categorized and prioritized? There are many different metrics out there, and not
every metric is worth pursuing, so the aim is to focus on importance and relevance.
2
1. Introduction
RQ3: What methodologies can be employed to measure the prioritized
impact metrics both quantitatively and qualitatively, and how can these
methodologies be applied to practical cases involving generative AI-
driven solutions within software development? This question focuses on con-
necting the theoretical with the tangible, asking not just what works on paper, but
what holds up in practice.
RQ4: How can these impact metrics be modeled to address the needs
of target stakeholders and support their decision-making process when
it comes to the integration of generative AI in software development
processes? Metrics alone do not provide much unless they are met with real needs.
This question explores how to shape them into models tailored towards specific
target stakeholders and how the models can be used to support them with their
decision-making process involving the integration of AI and AI-based solutions in
their processes.
1.4 Significance of the Study
This paper will set a foundation for researchers to contribute and further customize
models regarding the impact of AI in other contexts and scenarios, both within and
outside of software development. This will be achieved through bridging the gap
between theoretical frameworks and practical applications. This study will benefit
practitioners and software developers by providing methodologies and a road map
to evaluate and understand the impact of AI. Furthermore, the findings of this
study could allow organizations and decision makers to make informed decisions
about adopting AI solutions and optimize their integration into the workflows and
processes.
3
1. Introduction
4
2
Background
2.1 Generative AI
Generative AI is a form of AI designed to create new content, data, or solutions
that mimic human-made content. Generative AI models are trained to produce
and generate coherent outputs such as text, images, audio, videos, or code, unlike
traditional AI models that mainly perform classification and recognition-based tasks
[12]. To understand generative AI further, Feuerriegel et al. proposed a three-tier
conceptual framework [11]:
1. Model Level
There are different model architectures related to generative modeling, such as
Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and
Transformer-based models like GPT. In order to for these models to recognize and
replicate complex patterns in data and to generate relevant and realistic outputs,
they are typically trained on massive datasets [11].
2. System Level
Models alone cannot function well independently, as their functionality needs to be
embedded "to provide an interface for interaction" [11]. An example of this is seen
in Codex’s case [5], where the deep learning model would need to be integrated "into
a more interactive and comprehensive system, like GitHub Copilot" [11], allowing a
more efficient way for users to code.
3. Application Level
There are many ways to use generative AI in real-world domains, ranging from
fields such as software development (e.g., code generation), marketing (e.g., auto-
mated content production), customer service (e.g., chatbots), and healthcare (e.g.,
synthetic data for research). Most of these tasks tend to be regarded as "human-
task-technology systems" that use generative AI to "augment human capacities" and
whose success usually depends on how well they are integrated into the different
workflows and processes, and if they are well-trusted and understood by their users
[11].
Similarly, Nah et al. highlight the importance of having an AI-human collaboration
allowing for a higher level of automation by having the AI focus on generating the
5
2. Background
relevant content while having human oversight, as it is still critical to evaluate the
generated output quality, contextual relevance, and ethics [12].
2.2 AI in Software Engineering
Barenkamp et al.[3] conducted systematic reviews to assess the state of AI, potential
developments, and application of AI within the software development life cycle.
Their analysis identified multiple uses and advantages of AI, where some of those
advantages are the acceleration of the development processes and workflows as well
as reducing costs through the automation of time-consuming routine tasks [3].
Below are some of the identified uses of AI within the software development life
cycle:
Figure 2.1: Software Develoment Lifecycle [15][3]
1. Project Planning
In their literature reviews, Barenkamp et al. identified that AI can help with effort
and cost estimation by leveraging historical project data and Bayesian models. AI
can also perform risk prediction and management by identifying patterns such as cost
overruns, resource issues, and delays based on historical project data through pre-
dictive modeling and analysis. Furthermore, AI can help allocate human resources
6
2. Background
and assign tasks based on the developer’s skills, availability, and performance. With
more advanced models, they can also provide dynamic planning that allows realloca-
tion of resources based on unavailability or unforeseen circumstances [3]. Pothukuchi
et al. also highlight AI’s ability to assist with requirement gathering, stating that
AI can accelerate proposal writing as well as assist with defining the project’s scope.
Furthermore, some AI platforms can also increase the efficiency and alleviate the
effort of analysts writing "user stories in translating business requirements" [23].
2. Problem Analysis
Similarly to the planning phase, AI can further help with failure point prediction
by predicting potential financial, technical, and operational risks within software
projects. This is done through machine learning algorithms that can predict proba-
ble failure points by finding correlations and patterns in large datasets. Furthermore,
the use of discriminant analysis, which, based on historical project data, can classify
and weigh the different risk factors. AI can also help identify common issues and
needs to provide a better understanding of the factors and issues the software should
be addressing [3].
3. Software Design
AI can help accelerate design decision-making by providing design templates and
recommending architectural patterns based on the given project context through
the use of machine learning. Furthermore, AI can help generate content such as UI
designs or provide an idea of expected behavior. AI can also play a role in choice
validation regarding different areas within software design, such as database design
and component interaction, by providing different alternatives to design and archi-
tecture, as well as simulating their performance [3]. Additionally, AI contributes to
solidifying the software’s design by analyzing the weaknesses, providing alternative
design suggestions, generating user experience mockups, and writing system design
documentation [23].
4. Software Implementation
One of the most prominent uses of AI in the software implementation phase is code
generation, where AI can generate executable code based on human language de-
scriptions. Furthermore, AI can help with debugging and error detection and assist
with completing code and providing suggestions, which can help reduce effort and
increase efficiency. Additionally, it possesses the ability to translate code between
different programming languages and refactoring code [3]. Pothukuchi et al. expand
on these insights, stating that AI can help with optimizing code by organizing code
structures and using memory efficiently, which in turn results in better resource
utilization and performance [23].
5. Software Testing and Integration
AI can help with generating test cases automatically, which in turn saves time and
potentially results in higher test coverage. Also, by recognizing patterns and using
deep learning, it can detect bugs and common issues in the code. Even though
fuzz testing is not based on AI, it can be combined with some AI elements to
find vulnerabilities, which is done by injecting random or incorrect inputs into the
7
2. Background
software [3], allowing for edge case identification [23].
6. Software Maintenance
With some AI tools, it is possible to provide live anomaly detection and generate
some code fixes. They can also refactor and clean up code based on unused, re-
dundant, or duplicate code. AI can also help with automating updates and release
patches based on changes affecting the system, such as security threats, dependency
updates, and more. Furthermore, it can solidify security measures by simulating
attacks and continuous monitoring of the code and configuration [3]. AI can also
assist with reporting and monitoring, allowing the analysis of user behavior, pro-
viding application owners with custom reports as well as ensuring compliance with
policies and standards [23].
To contextualize the research, the focus will be on stage 5, "Testing and Integration,"
which closely relates to the projects that will be explored for this case study.
2.3 Automotive Software Engineering
The automotive Electrical/Electronic system acts as an essential platform that con-
nects and integrates the vehicle’s hardware and software. It connects sensors and
actuators with microcontrollers in order to translate "any physical domain into the
digital domain" [19]. In most cases, microcontrollers are the vehicles’ ECUs (Elec-
tronic Control Units), and by combining sensors, actuators, and ECUs, one can
achieve automated functionality. This is done by forming a distributed and net-
worked system for monitoring and controlling the vehicle’s behaviors through com-
munication networks such as CAN (Controller Area Network) and Ethernet, an
example of this can be seen in Figure 2.2. The purpose of sensors is to measure cer-
tain physical properties, while actuators are used to execute the different commands
based on software decisions. As for ECUs, their main purpose is to process data in
the embedded software system to provide real-time decisions. Typically, each ECU
is focused on a specific domain, such as the powertrain domain, which in turn con-
trols specific functionality related to the vehicle, such as engine management. This
directly ties to the focus of the cases this study will focus on. An example of a few
vehicle ECUs can also be seen in Figure 2.2 [19].
2.4 Case Study
This section discusses the two cases this research used to evaluate the impact model.
It examines the current processes employed by the company, outlines the intended
functionality of the proposed AI solution, and provides background information on
the model and methodology. Both projects follow the Polymer framework, which
is explained later in Section 2.4.3, and are ongoing with varying maturity levels.
Furthermore, all the interviews and surveys were conducted with the developers
and stakeholders of these projects.
8
2. Background
Figure 2.2: Domain-centralized E/E system example [19]
2.4.1 Control System Testing
For the first case, the focus is on a software testing process based on control sys-
tem testing, which will hence be referred to as the Control System Testing process
or CS-Testing process. This process is typically done by different teams, of which
there were two teams that the solution developers worked closely with. Each team
is typically responsible for a specific ECU in the vehicle. This process starts with
the team receiving the requirements, and based on those requirements, they would
trace the required signals and their respective attributes. The next phase would in-
clude writing the positive, negative, and boundary cases for the given requirement.
The tester would then write the tests and ensure they cover all the required scenar-
ios. Based on the requirement, they would either require hardware testing, which
uses physical components, or they would need software testing, which can be done
virtually. Both are usually executed on a rig, an isolated self-contained execution
environment, that can be used to simulate vehicle behavior and data. The tests
would then be executed, and a report would be generated, showing the results. This
process tends to require constant inter-team collaboration and communication. As
there are many teams involved before, during, and after the process, there is a high
variability in quality, consistency, and structure of the signals, signal attributes, and
even the test cases. This led to the proposal of this project, which is an AI solution
with the goal of automating this process; it is referred to as the CS-Testing tool.
During the period of this study, three main developers, with an experience level of
1-2 years in AI engineering, worked on this project alongside a few interns. However,
only the solution developers were considered for this study. The main purpose of
this tool is to alleviate problems such as difficulty tracing signals due to different
9
2. Background
naming conventions, lack of coverage, consistency, and many more. Figure 2.3 shows
a general overview of the CS-Testing process. Due to confidentiality restrictions, the
process diagram is simplified to encompass the general tasks and flow.
Function 
Developer Function 
Who updated Owner
requirement Review and Validation
Writing Tests
Understanding Goal: Cover all scenarios (Valid + Invalid + Edge Software
Specifications (In documentation tool) Cases)
OR
CI/CD
Rewrite Req Create algorithm: 
Update Main Requirement
Signal Tracing positive, negative Write Test Cases Review Test Cases Run Tests Report
Specification and navigation
If Req is not testable and boundary cases
Store in 
Rig Scheduling and 
proprietary 
Setup
documentation tool
Hardware
Tester If Fail
Figure 2.3: General overview of the CS-Testing Process
2.4.2 API-Testing
Similarly, this process is a software testing process focused on API testing and will
therefore be referred to as the API-Testing process. In this case, the solution de-
velopers worked closely with one team. This process starts with receiving requests
from another team in the form of APIs and YAML specifications. Then, a plan-
ning phase takes place to understand what things are required, which protocols are
needed, and if the CAN signals exist in the databases. Once that is done, the sig-
nals are mapped from Virtual Vehicle (VV) [30] with the API properties from the
databases. Then, the tests are written, which requires setting CAN signals on the
rig and undergoing some signal tracing. Once that is done, the tests are executed
using API requests and then validated against the expected values. If all tests pass,
they are then passed on to another team; if they fail, the testers must identify the
cause and then either fix it or communicate with the API developer team (if it is a
fault in the API). Figure 2.4 shows the general overview of the manual API-testing
process. Similarly to the CS-Testing process, two main developers, with an experi-
ence level of 1-2 years in AI engineering, worked on this project with the support
of a few interns. In this case, only the solution developers were considered for this
study. Due to confidentiality restrictions, the process diagram is also simplified,
encompassing the general tasks and flow.
Setup Writing Tests
Preparation for writing tests Goal: Cover all scenarios (Valid + Invalid CAN signal setting)
Receive Validate results against 
Plan and Mapping of CAN 
Specifications and Set CAN signals Manually map Perform API input Nightly Runs
Understand signals and API Construct Data corresponding generated Write test cases (Result vs set CAN signals)
API endpoints through VV requests Running Tests
Specifications parameters objects to CAN signals Must trace fails and fix if 
(New or updated) testers failure
Figure 2.4: General overview of the API-Testing process
A case study by Wang et al. discusses the proposed AI solution, the SPAPI-Tester,
which is developed with the purpose of automating in-vehicle API testing. Since
this is based on the SPAPI-Tester, the project is referred to as the API-testing tool.
It is proposed to help mitigate many issues faced while testing vehicle APIs, such
as the numerous systems (eg, CAN signals and VV simulators) being managed by
many different teams, making the process complex, which also tends to result in
10
2. Background
inconsistencies in documentation and extensive effort and time needed for testing.
Figure 2.5 shows the general manual and automatic steps from both the current
process and the proposed AI solution [30].
Figure 2.5: Manual Process vs AI solution Process [30]
2.4.3 Polymer
With the increase in complexity of software development processes, especially on a
large scale, there is a great need for an automated approach to manage the different
workflows. Parthasarathy et al introduced Polymer [20], which is a methodology
that reimagines software development workflow as different programmable entities.
By leveraging the power of LLMs, it is possible to automate workflows and processes
that were either not possible to automate or extremely difficult to automate that
pertain to the earlier phases within the software development lifecycle. This relates
to the earlier stages of the software development lifecycle mentioned previously and
seen in Figure 2.1, essentially targeting all the stages instead of the standard au-
tomation of the last stages, coupling with Barenkamp et al’s research involving AI
in software engineering. Polymer showed practical performance through real-world
efforts used at Volvo Trucks, where the LLM powered processes resulted in a signifi-
cant reduction in manual effort as shown in Spapi-tester, described as a "SW-defined
workflow that automates the test process", that automated 2-3 full-time equivalents
(FTEs) in the expense of two months of developing and deploying. As well as the
piloting of the Spapi-coder, an implementation workflow, with an estimate of saving
and automating 15-20 FTEs worth of development time. This significant research
merges the gap between the software development life cycle and utilizing AI and
11
2. Background
LLMs as "Skeleton Keys" and allowing for overcoming the technical and economic
challenges of automation within software development [20].
2.5 Impact Dimensions: Business and Technical
Biffl et al. argue that there is no single definition of the term "value" and that it
refers to the benefit derived from software, services, or processes. Stakeholders may
be driven by goals as both individuals and collectives with the "hope to derive some
benefit" [2]. These benefits may come in many forms and may fall under different
categories. Biffl et al. highlight some categories and forms: "tangible or intangible,
economic or social, monetary or utilitarian, or even aesthetic or ethical". They
further solidify the definition of the term value, stating that the term value refers to
the "ultimate benefit, which is often in the eye of the beholder and admits multiple
characterizations", meaning that value is stakeholder-defined [2]. The framework and
methodology of this research will follow the same approach of stakeholder-defined
impact.
12
3
Related Work
3.1 Risk evaluation and automation taxonomy
A paper by Feldt et al introduces a taxonomy that categorizes the different applica-
tions of AI in software engineering, highlighting the importance of risk assessment
[10]. They discuss three main facets: point of application, type of AI, and level of
automation, and how the level of risk correlates to the point of application and the
automation level - the higher the point of application and the higher the level of
automation, the higher the risk. This can be seen in Figure 3.1. The points of appli-
cation are categorized into three main categories: process level, product level, and
runtime, while the automation levels are categorized into 10 levels; level 1 being com-
pletely human decision and level 10 being completely autonomous [10]. They also
highlight the importance of using such a taxonomy in companies’ decision-making
processes regarding the integration of AI. This taxonomy offers a structured frame-
work for researchers to understand the different ways AI can be used in software
engineering and defines different terms and levels in order to facilitate communica-
tion [10].
Figure 3.1: Levels of risk on the AI-SEAL taxonomy with respect to Points of
Application and Levels of Automation [10]
13
3. Related Work
As software engineering has been pushing to reduce the time and efforts needed for
development and to increase productivity, Melegati and Guerra have created and
proposed the DAnTE taxonomy, a six-level automation degree of software develop-
ment tasks, providing a means to categorize and understand the role of automation
ranging from manual processes to fully autonomous generation [18]. The degrees of
automation can be seen in Figure 3.2.
Figure 3.2: Levels of automation - DAnTE scale [18]
3.2 Impact on productivity and research gap
It is evident that AI solutions have an impact on software development. A paper by
Peng et al. conducted a controlled trial of GitHub Copilot to investigate its effects on
developers’ productivity by asking 95 professional programmers to create an HTTP
server in JavaScript and running test suites on their GitHub repositories, comprising
12 checks, where all 12 checks must pass for the task to be completed [22]. They
were then assessed based on task success and task completion time. Results showed
that “less experienced developers, developers with heavy coding load, and older
developers benefit more from Copilot," where the tasks were completed 55.8% faster
than the control group [22]. This evidently shows that AI can help productivity by
a decent margin. However, they mentioned that the results may vary with other
tasks and that more research is required in order to generalize their findings to other
tasks, and that “further investigations into the productivity impacts of AI-powered
tools in software development are warranted” [22].
14
3. Related Work
3.3 Economics of Software Engineering
3.3.1 Return On Software Quality (ROSQ)
When looking at the cost of software quality, many different types and categories
may arise. A report evaluating the cost of software quality highlighted two major
types of quality costs: conformance and nonconformance. Slaughter et al. defined
conformance costs as "the amount spent to achieve quality products" [28]. Non-
conformance costs are defined as "all expenses that are incurred when things go
wrong". For each major type of quality costs, they have mentioned a few types of
their respective costs [28]:
Conformance
• Prevention Costs:
This type of conformance cost relates to the costs incurred in order to "prevent
defects before they happen". The examples provided are "costs of training staff
in design methodologies, quality improvement meetings, and software design
reviews" [28].
• Appraisal Costs:
Slaughter et al. state that these costs "include measuring, evaluating, or au-
diting products to assure conformance to quality standards and performance".
Examples of this include "code inspections, testing and software measurement
activities" [28].
Nonconformance
• Internal Failure:
This type of nonconformance cost relates to all expenses that take place before
the "product is shipped to the customer". Examples of this include "costs of
rework, re-inspection and retesting" [28].
• External Failure:
Slaughter et al. define external failure costs as "costs that arise from prod-
uct failure at the customer site". The examples stated are "field service and
support, maintenance, liability, damages, and litigation expenses" [28].
Furthermore, their approach to defining and measuring software quality is through
the costs of software failure. The goal is to maximize the profit that can be attained
by fixing defects as early as possible within the software’s life cycle. This is due
to the fact that the later you are in the life cycle, the higher the cost of defect
correction is. Hence, the cost of software quality is considered a metric and is used
when calculating the Return on Software Quality [28].
Return on Software Quality is a way to measure the financial benefits of investing
in software quality. The main idea behind the authors’ explanation is that "software
quality expenditures must be financially justified" and that "software quality is an
15
3. Related Work
investment that should provide a financial return". They evaluate the expenditure
through software quality improvement initiatives, where the examples of such ini-
tiatives are "design reviews, testing, debugging tools, code walkthrough, and quality
audits". They should result in software quality revenue (SQR), which is "derived
from the projected increases in sales or estimated cost savings due to the software
quality improvement". There are two main forms of investments in this situation:
Software Quality Investments (SQI), which are the initial costs of "training, tools,
efforts, and materials," and Software Quality Maintenance (SQM), which are all
ongoing costs used to maintain quality [28].
3.4 Software Process Improvement (SPI) and Re-
turn on Investment(ROI)
The main aim of Software Process Improvement (SPI) is to provide structure and
optimization for processes, creating "more effective and efficient software develop-
ment and maintenance". Van Solingen believes that an organization is likely to
produce timely and budget-compliant products if they are well managed and has
well-defined processes, specifically, engineering processes [29]. When following SPI
methods, SPI investments need to be justified, typically in the form of Return on
Investment (ROI), similar to the claims of Slaughter et al. [28]. This also allows or-
ganizations and managers to prioritize process improvements and allocate resources,
maximizing the benefits [29].
3.4.1 Capability Maturity Model (CMM) and Capability
Maturity Model Integration (CMMI)
Based on a previously established model of the Capability Maturity Model (CMM),
Paulk et al. played a crucial role in the development of version 1.1. Paulk et al.
outline CMM as a structured framework used to assess and improve the software de-
velopment process following a five-level maturity categorization: Initial, Repeatable,
Defined, Managed, and Optimizing [21]. Each of these represents a progression re-
garding the organization’s process control and standardization. These describe how
processes are initially unpredictable and reactive, while moving towards the opti-
mizing level, processes improve increasingly based on qualitative feedback and ideas.
This is typically used to guide organizations around process deficiencies and to im-
plement improvements in order to enhance the effectiveness of software quality and
management [21]. Van Solingen also mentions Capability Maturity Model (CMM),
stating that process improvements need to be tied to measurable outcomes, specifi-
cally in relation to business value or ROI rather than aiming for a higher maturity
level [29].
Similarly, Gallagher discusses the Capability Maturity Model Integration (CMMI),
which is built on CMM, describing it as a way to integrate different maturity models
into a framework by exploring beyond software engineering and venturing into other
areas involving system engineering, product and process development, and supplier
16
3. Related Work
sourcing. Similar to CMM, it also follows a five maturity levels, including: Initial,
Managed, Defined, Quantitatively Managed, and Optimizing. This enables orga-
nizations to implement process improvement initiatives across various disciplines
[13].
3.4.2 Pareto Analysis
Pareto Analysis emerged from the observation of uneven distribution in economic
wealth and operational results, it is known as the 80/20 rule, which is the idea that
80% of the benefits come from 20% of the efforts. Essentially, it means that the
majority of the results can be traced back to a minority of inputs. In a business
context, this would mean that the company should focus on the set of efforts, such
as products or customers, that lead to the majority of the results, such as the
company’s revenues and profits. Using this method can help organizations identify
internal strengths and weaknesses. Powell and Sammut-Bonnici state that "in many
businesses there is a strong tendency to add new products and customers while
failing to eliminate those which are obsolete or unprofitable" which can also relate
to the possibility that those obsolete or unprofitable products and/or customers may
very well account for the majority of the costs [24].
3.4.3 Quality Improvement Initiatives
In the BDM International example, Slaughter et al. showcase how they first used
Pareto Analysis to identify what the main defects causing most of the problems
were then dug deeper into the root of the main problem causing issues using the
fishbone diagram, a cause and effect analysis method, to find the root cause of the
problem. In this case, the root cause of the JCL errors [28]. With the results
of the Pareto analysis and cause-effect analysis, they were able to improve upon
the different process improvement initiatives. In this case, the main focus was on
reducing defects and attempting to eliminate failure costs. With this, they were able
to directly trace the efforts and results to each of those initiatives while focusing on
the top-most impactful problems. With each process improvement initiative clearly
defined, it is possible to visualize the results and impact. For example, evaluating
the process improvement against defect density, ROSQ, cost of quality, etc [28].
3.5 Project Size Estimation
3.5.1 Lines of Code
One of the easiest and widely known methods of software project effort estimation
is lines of code (LOC), which is usually shown as thousand lines of code (KLOC).
This is usually used in different cost estimation models alongside other constants
based on different factors such as complexity, different environments, or practices.
Some of those models are the Walston-Felix model, the Bailey-Basili model, the
Boehm models (COCOMO), and the Doty model. One of the main issues with
such an approach is the lack of definition of LOC or KLOC, as there is no universal
17
3. Related Work
definition of what falls under LOC. For example, some consider comment lines as
lines of code [17]. It is also important to note that each line of code can vary greatly,
especially when considering different programming languages. Another downside of
this approach is that it is quite difficult to estimate the LOC of a project in the
early stages of a project’s life cycle. Furthermore, it mainly considers the coding
aspect of a project, which, according to Emrick, makes up only 10-15% of the total
effort [9][17].
3.5.2 Function Point Analysis (FPA)
Similar to LOC, FPA is a method to measure the size of a project. However, it
also measures the complexity of the software based on the software’s functionality
from the user’s perspective. This not only allows the measurement to be language
independent but also allows for estimating the project’s effort early on in its life
cycle. There have been many different officially released versions of FPA, including
the Albrecht 79, Albrecht 83, and many International Function Point Users Group
versions (IFPUG), which were established from 1984 onwards to standardize the
approach and set specific rules. The way that FPA is measured is based on five
factors and assigning a weight derived from perceived complexity for each of those
factors [1].
An empirical study on Function Point Analysis conducted by Abran et al. states
that the five factors that make up the unadjusted function points (UFP) use two dif-
ferent measurement processes [1]. The five factors and their respective measurement
processes are shown in Table 3.1.
Table 3.1: Function points measurement model
Name Command
Data Measurement Process Internal Logical Files (ILF)
External Logical Files (EIF)
Transaction Measurement Process External Inputs (EI)
External Outputs (EO)
External Inquiries (EQ)
Even though this may be useful, UFP alone is not enough to get a good estimation.
To combat this problem, they highlight the importance of Value Adjustment Factor
(VAF), which is used to "assess the environment and processing complexity of the
software application as a whole" [1]. The general model to calculate the FPA is
shown in Figure 3.3.
The calculation of VAF is based on 14 predefined general system characteristics
(GSC), where each characteristic is assigned a weight based on the predefined defi-
nition. VAF is then calculated using this equation:
V AF = (0.65 + TotalGSC)
18
3. Related Work
Figure 3.3: Function Points Measurement Model [1]
In Table 3.2 an example of how the weights of each function type is assigned using
the Albrecht 83 version, where each function type is given three complexity weights:
Low, Average, High [1].
Table 3.2: Function point weights as proposed by Albrecht (1983) [1]
Albrecht 83
# Function Types Low Average High
1 Internal logical files 7 10 15
2 External interface files 5 7 10
3 External inputs 3 4 6
4 External outputs 4 5 7
5 External inquiries 3 4 6
In Table 3.3 an example is shown using the measurement model in Figure 3.3 and
the example values shown in Table 3.2.
3.5.3 COSMIC Function Point
COSMIC function point (CFP) is derived from the traditional function point analy-
sis, and it is said to be a less complex approach to estimating software project effort.
CFPs are more suitable for service-oriented, real-time, and embedded software sys-
tems. The main difference between the traditional version and the COSMIC version
is that COSMIC does not use complexity weights or value adjustment factors and
focuses more on data movement rather than functionality from an end-user perspec-
tive. It is calculated based on data entry, exit, read, and write, where each data
movement is counted as one CFP [6][7]. Furthermore, it can be used within the
19
3. Related Work
Table 3.3: Example of a function point following the Albrecht 83 version [1]
Example of a Function Point Count — Albrecht 83 Version
Function Types No. Functions * Weights = UFP Complexity Adjustment Factor
Internal logical files 3 ∗ 10 = 30 GSC 1 to 11 = .00
External interface files 0 ∗ 7 = 0
External Inputs 2 ∗ 4 = 8 GSC 12 to 14 = .04 each
External Outputs 2 ∗ 5 = 10
External Inquiries 5 ∗ 4 = 20 Total = .12
Total UFP = 68 VAF = (.65 + .12) = .77
Adjusted Function Points = UFP * VAF = 68 * (0.77) = 52 AFP
management and organizational side of software development, where it can function
as a metric for scope management, resourcing, productivity, and quality. It also
allows tracking the costs per CFP, which can help companies set and reach the goal
of better "value for money". Moreover, this can allow teams to track their effort per
sprint using CFP, where they can set CFP per sprint goals [7].
3.6 Value Stream Mapping
Value Stream Mapping (VSM) is a lean management tool that is used to anal-
yse and optimise the flow of material and information related to production and
service processes. According to Rother and Shook, VSM enables identification of
"value-added" and "non-value-added" steps visually in order to help target waste and
improve process efficiency [25]. To help with this process, Langstrand proposes some
steps as guidance for creating the Value Stream Map and analysis. The first phase
includes "creating the current state map," which involves drawing up the process,
then analysing and highlighting the flow of information and materials within that
process, adding the relevant data to the timeline with the relevant calculations. The
second phase focuses on "analyzing the current state map", which involves identi-
fying the bottlenecks within the process, and comparing the capacity and demand,
as well as exploring the flexibility of the process, with the main focus on waste.
Finally, the third phase focuses on "creating the future state map" [16]. Tying this
to the automotive industry, Bhamu et al. conducted a case study in the Indian
automotive industry, where they focused on how VSM can enhance the performance
of the production by focusing on key lean metrics like lead time and quality defects.
Through mapping and future state planning, they were able to align the demand,
follow lean principles, and allow continuous improvement. Their efforts resulted in
major improvements such as reducing lead time by ~20.97%, increasing value by
27%, and improving first time throughput in almost all processes [4].
20
4
Methods
4.1 Research Design and Approach
The main research design for this study is a case study, focusing on the two main
projects mentioned earlier. This design approach was taken as it allowed a better
in-depth exploration of the real-world context in which AI is being implemented [14].
Moreover, these two projects were highly context-based, with direct connection to
specific workflows, processes, and organizational structure. An iterative approach
was used to build the model and framework for this study. This ties nicely with Hart-
ley’s Book, "Essential Guide to Qualitative Methods in Organizational Research",
where she highlighted the value of case studies in exploring complex organizational
processes within their real-life contexts and further underscores the flexibility of case
study research in allowing multiple data collection methods, including interviews,
observations, and document analysis [14].
This iterative approach was carried out by conducting literature reviews to discover
what possible methodologies and metrics exist, breaking them down, and attempting
to use them. If existing methods did not fit the specific context of this study, they
were either adapted by reusing or modifying relevant components or omitted entirely
when they proved incompatible. This approach was taken to evaluate the impact of
AI within a software development context, leading to the proposal of a structured yet
flexible five-phase methodology to guide the modeling of the impact by quantitative
and qualitative means. This methodology also considers an automatability factor
in which there may be a possibility of automating the modeling process. Creswell
highlighted the importance of integrating both qualitative and quantitative methods
to enrich the research findings [8]. Furthermore, he discusses iterative approaches
surrounding the idea of conducting data collection and analysis cyclically in order
to increase the study’s validity and reliability, aligning with my research approach
[8].
Figure 4.1 shows an overview of the research design where, during the design phase,
the research questions were formed based on the research gap and aligned with the
company’s goals. Then, to answer RQ1, literature reviews were conducted, and the
findings were formed based on some informal conversations with some of the stake-
holders. As for RQ2-4, they were answered by first conducting literature reviews
21
4. Methods
and forming the base of the framework using the literature review findings, as well as
engaging in interviews and conversations with the different stakeholders and the de-
velopers. Once the base was formed, an iterative process was conducted. It started
with literature reviews and then testing the possibility of applying the methods by
exploring the available data, as well as contrasting with the stakeholders’ and devel-
opers’ interests based on the previous interviews and their expressed support. With
that done, the method was considered established, and then the data collection pro-
cess started, where interview/survey rounds were conducted if possible. Once that
was done, the results were analyzed, and the process was repeated.
Design Phase RQ1 RQ2-4
Iterative Optional
Forming Form base of Test 
Research framework possibility of 
Surveys
Questions and roadmap method
Research Literature Literature Establish Analyse Literature Literature Data 
Gap Reviews Reviews Review Reviews Accessibility Method Results
Company Stakeholders and Stakeholder 
Interest and InterviewsDeveloper 
Goals Engagement Support
Figure 4.1: Research plan overview.
4.2 Theory
4.2.1 Intertwined quality and Efficiency
When considering the idea of quality and efficiency, it can be quite challenging to
focus on one and not the other, and that is due to the interconnected relationship be-
tween them. In most cases, quality is considered the foundation of an organization’s
survival. Xu stated that "Quality is the guarantee of efficiency" and "Efficiency is
the benefit of quality", highlighting the fact that efficiency is useless without quality,
essentially meaning that they are complementary and not mutually exclusive [31].
In this context, efficiency refers to the effective use of time, effort, and resources in
software development processes, such as minimizing rework, reducing testing time,
or optimizing workflows, while still maintaining or improving output quality. Due
to this, any efficiency-related or quality-related factor within the different software
development processes fell under the quality category and were weighed equally.
4.2.2 Surveys and Interviews
As part of communicating with stakeholders and developers and highlighting the
key factors, the focus of the interviews and surveys covered a few main aspects:
Understanding The Process
The main purpose was to understand how the manual processes work and what
the stakeholders’ teams do. Furthermore, it aimed to gain an understanding of the
solution developers’ efforts, perspectives, and what they do or plan to do. This
22
4. Methods
information was then used to break down and categorise the different workflows in
the processes. This was mainly done by conducting semi-structured interviews.
Quality and Efficiency Factors
The main focus here was on identifying the key factors and sources contributing to
both high and low quality, as well as high and low efficiency, within the workflows
and processes used by the stakeholder team. This was done by conducting surveys
and/or interviews, with the addition of informal conversations to clarify certain
factors if needed.
Prioritization using $100 method
The focus here was on prioritizing and ranking the different factors for each area
(high/low efficiency and high/low quality) using the $100 method, where each person
got to split $100 over the factors from each quality category. This was achieved
purely through conducting surveys.
Mapping Factors and Gap Analysis
The focus of this round of interviews was on mapping the identified factors to the
major/main components of the AI solution. Moreover, identifying and understand-
ing the reasons behind which factors were not targeted and why.
Automation Factors
The main focus was on the solution developers, with the goal of understanding
what factors were considered when developing their solution to decide whether they
should use traditional scripting or LLMs to automate specific steps in the process.
4.2.3 Factors and Issues Prioritization
Following the Pareto method, the idea of the 80/20 rule [24], where in this case 80
percent of the problems can be traced to 20% of the causes. This was the main
form of prioritizing what quality issues needed to be mitigated the most and what
quality factors were the most important to keep or improve. This helped provide
simple but clear goals and targets for both designing and improving the AI solution,
as well as a way to evaluate the efforts to mitigate or improve those factors.
4.2.4 Software Process Improvement Initiatives (SPII)
By extracting the different project components and using the process improvement
initiative approach, following a similar path to Slaughter et al. [28], it is possible
to directly evaluate the efforts of such initiatives or feature/functionality against
different tangible outcomes, such as time, cost, quality factors or process specific
metrics.
4.2.5 Automation Levels and risk
By exploring and understanding the projects and how they fit into the current
manual processes, it was possible to use the DAnTE taxonomy to estimate the
23
4. Methods
automation level of the projects [18]. Moreover, by looking at Feldt et al’s risk
taxonomy, it was possible to understand and highlight the general risk level of
applying such automation levels in the specific points of application [10]. The degree
of automation from the DAnTE taxonomy was used here instead of Feldt et al’s
taxonomy, as based on the descriptions, it was more suitable for the projects, as well
as being more concentrated on LLMs, which related strongly to the AI technology
being used in those projects [18].
4.3 Framework
This section provides a general overview of the five phases, a description of each of
its subsections, and a subjective measure through a three-level automatability level
(Low, Medium, and High) based on how automatable the process is, including data
collection and analysis:
1. Understanding the current process (Medium)
This phase sets the baseline and enables understanding of how the current process
operates without any external intervention.
a. Stakeholder engagement (Medium)
Engage with developers, managers, and product owners to gain a better under-
standing of the current processes, workflows, and how everything fits together
to collect insights about pain points, team structure, and communication pat-
terns.
b. Solution developer insights [if under development] (Medium)
If there is a solution currently under development, gather feedback from those
developing the AI solution to gain a different perspective on the processes.
c. Workflow breakdown (Low)
Based on the gathered information, deconstruct and break down the processes
and workflows into distinct steps to identify each step separately.
2. Identifying and prioritizing the problem (High)
This phase focuses on discovering pain points and factors that relate to the quality
of the workflow.
a. Stakeholder insights (Medium)
Engage with developers, managers, and product owners to gain a better under-
standing of the problems and pain points in the current processes and work-
flows. Collect qualitative insights to understand how current issues affect de-
veloper productivity and software delivery.
b. Identifying key factors (High)
Extract the sources or factors that lead to high and low quality and efficiency
in the current workflows and processes.
24
4. Methods
c. Data collection (High)
Collect data related to the factors from the previous step and use analytical
tools or prioritization methods to prioritize the most prominent factors.
3. Breaking down the AI solution (Medium)
This phase breaks down the AI solution in a structured manner in order to under-
stand and/or predict the effects on the current workflow.
a. Solution developer insights (Medium)
Engage with the developers of the AI solution to gain a better understanding
of the solution’s features and development plan.
b. Solution breakdown (main components) (Medium)
Identify the core components of the AI solution.
c. Assessing constraints and limitations / risk (Medium)
Highlight the constraints and limitations of the data accessibility and the AI
solution, as well as understand the risks tied to the different automation levels
and usage of AI.
4. Aligning the solution to the problem (Low)
Map the features of the AI solution to the problems identified previously to evaluate
the direct impact of the AI solution on the newly adjusted process.
a. Mapping of prioritized factors and solution components (Medium)
Establish connections between the prioritized factors and the different identified
components of the AI solutions.
b. Gap analysis (Low)
Compare the proposed and the current process and identify whether or not the
AI solution has mitigated the identified issue. Furthermore, understand the
reasoning behind the factors not being targeted.
5. Analyzing the impact (Medium)
Evaluate the impact of implementing and integrating the AI solution with the cur-
rent process to form the new process.
a. Assessing outcomes (High)
Collect and analyze the quantitative and qualitative data regarding the quality
and efficiency factors from surveys and interviews
b. Result compilation [if developed or mature enough] (High)
Assess the results of the solution regarding its expected outcome.
c. Pre and Post comparison [if applicable] (Low)
Compare relevant metrics and/or KPIs (Key Performance Indicators) before
and after AI integration to assess causal impact.
25
4. Methods
Figure 4.2 shows the general overview of the different phases, the steps within them,
as well as the general flow.
1. Understanding the current process 2. Identifying and prioritize the problem
Solution 
Stakeholder Workflow Stakeholder Identify Key Data Prioritization 
Developer 
Engagement Breakdown Insights Factors Collection and Analysis
Insights
4. Aligning the solution to the problem 3. Breaking down the AI solution
Mapping of
Assessing Solution Solution 
Gap Analysis Prioritized Solution Constraints and Breakdown Developer 
Limitations / Risk
Factors Components (Components) Insights
5. Analyze the impact
Result Pre-Post 
Assess 
Compilation comparison
Outcomes
[If applicable] [If applicable]
Figure 4.2: Framework workflow overview
4.4 Data Collection
To understand the processes and utilize the proposed methods and metrics, five main
rounds of surveys and/or interviews were conducted. These five rounds involved dif-
ferent stakeholders and solution developers. These stakeholders’ roles ranged across
managers, product owners, and software testers (stakeholder teams), while the solu-
tion developers were the developers currently developing or previously contributing
to the development efforts of the AI solution. The approach of using interviews
as one of the main data collection methods enabled collecting deep and qualitative
insights about the different stakeholder perceptions regarding the current processes
and goals of the solution. In contrast, surveys were used to collect quantitative data
in a simple manner, covering a broader perspective, typically complementing the
interviews, which allowed for the uncovering of different patterns and themes. To
achieve this, all the components of the survey allowed for free text input. To reach
the stakeholders for interviews and surveys, I was initially referred to five stakehold-
ers involved with the CS testing tool and two stakeholders associated with the API
testing tool by the project developers. Since these individuals were selected based on
their relevance and involvement in the respective tools, this is considered purposive
sampling. These initial stakeholders then either nominated additional participants
for the other rounds of interviews and surveys or provided contact lists from within
their teams, following a snowball sampling method. As for the developers, all in-
dividuals involved in the development of both tools were contacted directly, which
26
4. Methods
follows a census sampling approach, as the intent was to include the entire relevant
developer population.
Round 1
In order to understand the processes and stakeholders’ goals, semi-structured in-
terviews were conducted with various stakeholders, different teams involved in CS-
Testing and API-Testing, as well as solution developers involved in the development
of the CS-Testing tool and the API-Testing tool. Informal conversations also took
place when needed, specifically with the solution developers to better understand
the different processes and solutions, as they tended to have a detailed overview
and understanding of the processes currently being employed and the projects being
developed.
Round 2
In order to gather the different sources and factors related to high/low quality and
efficiency within the different teams’ workflows and processes, structured interviews
and surveys were conducted, with the addition of follow-up conversations for clar-
ification purposes. The main questions asked in the interviews and surveys can be
seen in Appendix A.3.
Round 3
The factors deduced from the previous round of surveys and interviews were ana-
lyzed, cleaned up, and summarized into a maximum of 10 bullet points per question
for the first four questions. As this round’s focus was on using the $100 method to
prioritize the efficiency and quality factors, conducting surveys was the most suit-
able for this method. This also targeted members from the stakeholder teams as well
as the solution developers, due to the fact that they had extensive knowledge and
understanding of the different processes used by the stakeholder teams. The $100
method asked participants to distribute a hypothetical $100 across a set of items
to reflect their relative importance, highlighting the main factors. This helped with
identifying the key factors by having respondents rank items within the following
four main categories:
• High Efficiency
• Low Efficiency
• High Quality
• Low Quality
Round 4
This round only conducted interviews due to its main focus being on mapping the
factors to their respective components from the AI solutions. This targeted the
solution developers of each of the tools. The interviews followed an open-ended
format to allow participants the flexibility to express their thoughts freely.
27
4. Methods
Round 5
Since this round’s focus was on understanding what factors the solution developers
consider when choosing the automation technique, either surveys or interviews could
have been used. However, semi-structured interviews were the most suitable as
they allowed for capturing depth and exploring the larger context of those factors.
Furthermore, this allowed for a better understanding of how solution developers
think.
28
5
Results
5.1 Interview and Surveys
5.1.1 Round 1
CS-Testing
During this round, interviews with five members from the two stakeholder teams
involved in CS-Testing were conducted. Their roles comprised managers, specialists,
and central members within the processes, allowing the capturing of crucial steps and
workflows within their processes in order to understand and explore the processes.
Furthermore, discussions regarding their goals and needs stemmed from issues and
great annoyances, highlighting the importance of quality and efficiency within their
processes. Table 5.1 shows the interviewees’ roles and which team they are a part
of.
Table 5.1: Manual CS-Testing Teams and Roles
Team Role
1 Group Manager, Manage Electrical Engineering
1 Senior ESW Application Engineer
2 Specialist ESW Application Engineer
2 Experienced ESW Application Engineer
2 Specialist System Verification Engineer
ESW: Embedded Software
API-Testing
For this process, interviews with two members from the stakeholder team were
conducted. The roles interviewed were a manager and a crucial member within the
testing workflow. This also allowed the capture of essentials within the process.
Similarly to CS-Testing, their focus was also on quality and efficiency. Table 5.2
shows the interviewees’ roles.
29
5. Results
Table 5.2: Manual API-Testing Teams and Roles
Role
Senior ESW Application Engineer
Experienced ESW Application Engineer
ESW: Embedded Software
5.1.2 Round 2
For this round, one project collected data through surveys while the other project
collected data through interviews. Although the data collection methods differed
between the projects, they are still comparable due to the fact that both the in-
terviews and surveys focused on the same five questions, which are presented in
Section 4.4.
CS-Testing
For CS-Testing, the stakeholders preferred to participate through surveys instead of
interviews. The survey was sent to thirty members, and seven members responded,
where the roles comprised managers, testers, product owners, and engineers. Ta-
ble 5.3 shows the different roles and experience levels of the interviewees as well as
their teams. The number of responses per category can be seen in Table 5.4, while
Tables 5.6–5.9 show the cleaned and finalized factors.
Table 5.3: CS-Testing Teams, Roles and Experience Levels
Team Role Experience (years)
1 Product Owner 4+
1 Senior ESW Application Engineer 3–4
1 Component Level Testing 4+
1 Associate Engineer 2–3
2 System Verification Engineer 2–3
2 ESW Application Engineer 1–2
2 Experienced SW Verification Engineer 3–4
ESW: Embedded Software
SW: Software
Table 5.4: Number of factors identified per category - CS-Testing (Raw)
Category Number of Factors
High Efficiency 7
Low Efficiency 17
High Quality 13
Low Quality 15
30
5. Results
API-Testing
For the API-testing case, the stakeholders preferred to participate in interviews
rather than surveys. The interviews were conducted with two out of fifteen mem-
bers from the stakeholder team, holding Experienced Embedded Software (ESW)
Application Engineer and testing roles with 1 to 2 years of experience in component-
level testing. The number of extracted factors per category can be seen in Table 5.5
while the finalized and cleaned factors can be seen in Tables 5.10–5.13.
Table 5.5: Number of factors identified per category - API-Testing (Raw)
Category Number of Factors
High Efficiency 8
Low Efficiency 16
High Quality 5
Low Quality 8
5.1.3 Round 3, 4, and 5
Round 3 - CS-Testing
A total of 3 surveys took place, including 2 testers with 2 years of experience in
testing from the stakeholder teams and 1 solution developer.
Round 3 - API-Testing
A total of 4 surveys took place, including 2 testers with 1 and 2 years of experience
from the stakeholder team and 2 solution developers.
Round 4
A solution developer from each of the two projects was interviewed.
Round 5
A total of six solution developers were interviewed, and several common factors were
identified across them.
5.2 Pareto Analysis
The results of the round 3 surveys were collected and analyzed using the Pareto
analysis method.
31
5. Results
5.2.1 CS-Testing
Pareto Analysis - CS-Testing Tool High Efficiency Pareto Analysis - CS-Testing Tool High Quality
60
100 100
80
80% Threshold 5080 80% Threshold 80
60 40
60 60
30
40
40 40
20
20 20 10 20
0 I II III IV V VI VII 0 0 I II III IV V VI VII VIII IX X 0
Factors Factors
(a) CS-Testing High Efficiency (b) CS-Testing High Quality
Pareto Analysis - CS-Testing Tool Low Efficiency Pareto Analysis - CS-Testing Tool Low Quality
70 70
100 100
60 60
80% Threshold 80 80% Threshold 80
50 50
40 60 40 60
30 30
40 40
20 20
20 20
10 10
0 I II III IV V VI VII VIII IX X 0 0 I II III IV V VI VII VIII IX X 0
Factors Factors
(c) CS-Testing Low Efficiency (d) CS-Testing Low Quality
Figure 5.1: Pareto Chart - CS-Testing quality and efficiency factors
Table 5.6: Legend for CS-Testing - High Efficiency Factors
I Ability to write better test cases faster.
II Prioritizing tasks and using time-blocking techniques.
III Utilization of [Internal Documentation Tools] for functional understanding,
workflow analysis, and dependency mapping.
IV Workflow design and mapping to identify bottlenecks and create standard
operating procedures (SOPs).
V [Internal Documentation Tool] collaboration view and verification tools aid
in test case creation and automation.
VI Availability of test rigs.
VII Clear scope definition for [Specific ECU] component level SW Release and
Regression.
32
Count Count
Top 20% Factors
Top 20% Factors
Cumulative % Cumulative %
Count Count
Top 20% Factors Top 20% Factors
Cumulative % Cumulative %
5. Results
Table 5.7: Legend for CS-Testing - High Quality Factors
I Attaching test cases to requirements in [Internal Documentation Tool] for
coverage tracking and documentation.
II Ensuring the final product meets stakeholder needs.
III Maintaining work and reporting areas in [Internal Documentation Tool] for
tracking verification reports and traceability.
IV Benchmarking SW Release dates allows sufficient verification time.
V Comprehensive testing addressing all project aspects.
VI Performing component-level regression with all available test cases for val-
idation.
VII Comparing results with previous releases for analysis.
VIII Maintaining clear and detailed records of processes and changes.
IX Robustness and ability to handle unexpected issues.
X Regularly checking outputs against requirements and standards.
Table 5.8: Legend for CS-Testing - Low Efficiency Factors
I Excessive time spent writing test cases.
II Difficulty maintaining test cases with updated requirements.
III Rewriting the same test cases for different vehicle modes.
IV Time spent correcting mistakes or redoing work due to lack of clarity/errors.
V Writing repetitive test cases for similar requirements with minor variations.
VI Difficulty retrieving necessary information.
VII Need for improvement in Requirements Traceability.
VIII Dependency on tools for generating test case execution files (e.g., XML).
IX Frequent unavailability or issues with validation rigs (e.g., rigs, digital twin
errors(VV)).
X Slow progress due to limited resources or capacity bottlenecks.
Table 5.9: Legend for CS-Testing - Low Quality Factors
I Poor traceability hindering the tracking of requirements and changes.
II Team errors due to fatigue, misunderstanding, or lack of training.
III Neglecting to review processes or outputs.
IV Inconsistent reports from regression due to inconsistent test cases or tool
timing issues.
V Failure to account for unusual scenarios.
VI Inadequate quality of edge case test cases.
VII Missing edge case testing leading to post-release issues.
VIII Missing corner case test cases leading to incomplete test coverage.
IX Inadequate testing coverage leaving critical areas untested.
X Variations in process quality reducing reliability.
33
5. Results
The charts in Figure 5.1 show the Pareto analysis conducted on each of the factors
deduced from Round 2 using the results from Round 3. For each of the Pareto anal-
ysis charts, a corresponding legend can be found in Tables 5.6–5.9. Having taken
this approach, the top 20%, which in most cases are factors I and II, usually have
the biggest contribution to the problems experienced in the processes. However,
they do not directly hold the Pareto Principle of 80/20, which can be due to two
main reasons. The first reason is that the surveys had a very low participation rate,
while the second reason could be that the factors and issues faced are more system-
atic and widespread, with not much of a clear focus on specific factors, especially
considering that different stakeholders’ perspectives do not align. For figure 5.1a,
the first two factors contribute to ~60% while the top 2 factors in both figure 5.1b
and figure 5.1c contribute to ~50% of the total factors within their respective areas.
As for figure 5.1d, it contributes to ~45%.
Based on the results, the top factors contributing to high efficiency (figure 5.1a) are
"The ability to write better test cases faster" (I) and "Prioritizing tasks and using
time-blocking techniques" (II). While the top factors contributing to high quality
(figure 5.1b) are "Attaching test cases to requirements in [Internal documentation
tool] for coverage tracking and documentation" (I) and "Ensuring the final product
meets stakeholder needs". These are the factors that the AI solution must maintain
and keep. The top factors contributing to low efficiency (figure 5.1c) are "Excessive
time spent writing test cases" (I) and "Difficulty maintaining test cases with updated
requirements" (II). For the top factors contributing to low quality (figure 5.1d), the
factors are "Poor traceability hindering the tracking of requirements and changes"
(I) and "Team errors due to fatigue, misunderstanding, or lack of training" (II).
These are the factors that the AI solution must mitigate or fix.
34
5. Results
5.2.2 API-Testing
Pareto Analysis - API-Testing Tool High Efficiency Pareto Analysis - API-Testing Tool High Quality
140
100 100
100
120
80% Threshold 80 80% Threshold80 80100
60 60 80 60
60
40 40 40
40
20 20 20
20
0 I II III IV V VI VII 0 0 I II III IV V 0
Factors Factors
(a) API-Testing High Efficiency (b) API-Testing High Quality
Pareto Analysis - API-Testing Tool Low Efficiency Pareto Analysis - API-Testing Tool Low Quality
100 120 100
80
80% Threshold 80 100 80% Threshold 80
60 80
60 60
40 60
40 40
40
20 20 20 20
0 I II III IV V VI VII VIII IX X 0 0 I II III IV V VI VII 0
Factors Factors
(c) API-Testing Low Efficiency (d) API-Testing Low Quality
Figure 5.2: Pareto Chart - API-Testing quality and efficiency factors
Table 5.10: Legend for API-Testing - High Efficiency Factors
I Clear acceptance criteria
II Speed of bug detection
III Developer-tester feedback loop
IV CI scheduling (Nightly regression)
V Helper functions for test case writing
VI Planning phase for tests
VII Optimized tests
Table 5.11: Legend for API-Testing - High Quality Factors
I Stable runs on regression (non-flaky tests)
II Robustness of test cases
III Fail test pointers – probable causes
IV Uniformity of structure and framework
V Report generation for validation
35
Count Count
Top 20% Factors
Top 20% Factors
Cumulative % Cumulative %
Count Count
Top 20% Factors
Top 20% Factors
Cumulative % Cumulative %
5. Results
Table 5.12: Legend for API-Testing - Low Efficiency Factors
I Dependencies on specific roles or information from other teams, causing
bottlenecks.
II Manual data mapping and integration for test case writing.
III Difficulty and time wasted due to missing documentation and inconsistent
naming conventions.
IV Challenges in backtracking and identifying the root cause of failures in test
cases involving multiple API objects.
V Impact of varying tester experience and understanding on efficiency.
VI Time-consuming process of writing comprehensive edge and invalid test
cases.
VII Steep learning curve and complexity in understanding and setting up en-
vironments (e.g., Rig setup, Virtual Vehicles).
VIII Inefficiencies and delays due to manual rig scheduling and low rig availabil-
ity.
IX Time spent rewriting/modifying and rerunning test cases after API up-
dates.
X Time spent fixing and redoing tasks due to human errors.
Table 5.13: Legend for API-Testing - Low Quality Factors
I Missing documentation impacting quality
II Incomplete coverage of edge cases and invalid test cases
III Low test coverage
IV Mapping according to requirements
V Lack of a dedicated debug function (tester dependent debugging)
VI Not testing error codes (e.g., 404)
VII Testers’ time management impacting quality
Like the CS-Testing, Figure 5.2 shows the Pareto analysis charts for the API-testing
alongside its corresponding legends in Tables 5.10–5.13. In this case, the charts also
show a similar trend of not adhering to the Pareto principle. Similarly, different
stakeholder perspectives not aligning can be a reason as well as the issue of a very
low sample size participating in the surveys. For figure 5.2a and figure 5.2d, the
first two factors contribute to ~60% while the top 20% of factors in both figure 5.2b
and figure 5.2c contribute to ~50% of the total factors within their respective areas.
Based on the results, the top factors contributing to high efficiency (figure 5.2a)
are "Clear acceptance criteria" (I) and "Speed of bug detection" (II). While the top
factors contributing to high quality (figure 5.1b) are "Stable runs on regression (non-
flaky tests)" (I) and "Robustness of test cases". These are the factors that the AI
solution must maintain and keep. The top factors contributing to low efficiency
(figure 5.1c) are "Dependencies on specific roles or information from other teams
causing bottlenecks" (I) and "Manual data mapping and integration for test case
36
5. Results
writing" (II). For the top factors contributing to low quality (figure 5.1d), the factors
are "Missing documentation" (I) and "Incomplete coverage of edge cases and invalid
test cases" (II). These are the factors that the AI solution must mitigate or fix.
5.3 Software Process Improvement Initiatives (SPII)
Due to confidentiality, the actual components will not be displayed. However, they
will be displayed with substitute aliases of "component #". An idea of what a
component is can be seen in the example of a system where there is a major part
that generates code using AI, labeled as "Code generator".
Figure 5.3 shows a basic idea of the major components of both projects without
stating the actual components:
Internal 
Documentation Component 2 Internal Tool
tools and DB
Component 1 Component 3 Component 4 Component 5
Figure 5.3: Main AI components of the CS-Testing tool and the API-Testing tool.
The primary purpose of breaking the system into different components is to enable
mapping and visualizing the efforts involved in developing the components and their
impact on various quality and efficiency factors identified through surveys. Further-
more, categorising them into factors that the components fix and those that the
components will keep or improve. This was done during Round 4 of interviews,
where the solution developers of each of the tools were interviewed with the focus
of mapping each of the factors to their respective components. The mappings and
the contributions of the components to the factors are displayed in Figure 5.4.
5.3.1 CS-Testing Tool
Factors Fixed
Based on Figure 5.4a, it is evident that for the Efficiency factors, the most impactful
component is Component 1, fixing a total of six issues faced by the stakeholders
within the manual CS-Testing process. While for the Quality factors, Component 3
fixed the majority of the issues, targeting a total of five factors. Overall, this tool
targeted 20/20 of the factors that needed fixing.
Factors Kept/Improved
Based on Figure 5.4b, for the Efficiency factors, Component 1 targeted the most
factors, with a total of 4 factors. While for the Quality factors, Component 5
targeted the most factors with a total of 6 factors. However, overall, the CS-Testing
37
5. Results
CS-Testing - Factors Fixed CS-Testing - Factors Kept
10 Efficiency 10 Efficiency
8 Quality 8 Quality
6 6
4 4
2 2
0 0
t 1 t 2 t 3 t 4 t 5 t 1 t 2n n n n n n n nt 
3 t 4 t 5
ne ne ne ne ne ne ne
n n
o o o o o o o on
e ne ne
mp mp mp mp mp mp mp mp mp
o
mp
o
Co Co Co Co Co Co Co Co Co Co
(a) CS-Testing Fixed Factors. (b) CS-Testing Kept Factors.
API-Testing - Factors Fixed API-Testing - Factors Kept
10 10Efficiency Efficiency
8 Quality 8 Quality
6 6
4 4
2 2
0 0
t 1 t 2 t 3 t 4 t 5 t 1 2n n n n n n nt nt 
3 t 4n nt 
5
ne e e e e e e e e eo on on on on on on on n n
mp mp mp mp mp mp mp mp mp
o po
Co Co Co Co Co Co Co Co
m
Co Co
(c) API-Testing Fixed Factors. (d) API-Testing Kept Factors.
Figure 5.4: SPII vs Quality and efficiency factors.
tool missed one of the Efficiency Factors, meaning that the tool only targeted 16
out of 17 factors that needed to be kept.
5.3.2 API-Testing Tool
Factors Fixed
Figure 5.4c shows that Component 2 contributed the most to fixing the Efficiency
factors, fixing a total of 4 factors. While Component 1 was fixed, 6 of the Quality
factors contributed the most in this category. The API-Testing tool was able to
target all the factors that needed to be fixed, totaling 17/17 factors.
Factors Kept/Improved
Figure 5.4d shows that for the Efficiency factors, Component 1 contributed the
most by keeping or improving three factors. While for the Quality factors, both
Component 2 and Component 4 contributed equally as much, with a total of 2
factors each. However, similar to the CS-Testing Tool, it did not target every factor,
missing two factors, both related to Efficiency. Overall, the API-testing tool was
38
Cumulative Factor Count Cumulative Factor Count
Cumulative Factor Count Cumulative Factor Count
5. Results
able to target 10 out of 12 factors in this category.
5.3.3 Gap Analysis
Although the factors may be traced directly to the sources (Components), SPII does
not highlight the reasoning behind it, and further exploration may be needed in the
form of a gap analysis. In this case, this was simply done within Round 4 of the
interviews, where the solution developers were able to highlight the reason behind
the tools not being targeted by the tools. The reason why they have not been
targeted is due to the fact that they were either restricted by company policies or
controlled by another internal third party. Moreover, the factors that were valued
the highest in the Pareto analysis have all been targeted by their respective AI
solutions.
5.4 Automation Factors
Interviews conducted with the solution developers showed a near-unanimous agree-
ment regarding which factors are considered when deciding to use programmable
LLM modules or traditional scripting as the main method of automation for the dif-
ferent steps of the process. Furthermore, some of the factors were backed by previous
research based on Polymer as explained by Parthasarathy et al [20]. Those factors
were then categorised and defined based on the interview responses. The factors
can be seen in table 5.14 alongside the possible options and guiding questions.
39
5. Results
40
Table 5.14: Factors considered when automating
Category Factor Description Options
Formatting
Formatting Context understanding Does the step require interpreting meaning beyond explicit inputs? Yes, No
Semi-structured language Does this step involve inputs that mix both structured elements (e.g., Unstructured,
support code, configuration) and unstructured components (e.g., natural lan- Structured, Semi-
guage descriptions)? structured, Human
Formalization Required Does this step require converting these specifications into machine- Yes, No
executable formats?
Structured data input Is the data input for this step consistently structured and predictable? Yes, No
Replicable patterns/boil- Can this step be standardized using predefined templates or reusable Yes, No
erplates patterns?
Dynamic interaction re- Does this step require adaptive exchanges based on changing inputs or Yes, No
quired interactions with constantly updating sources?
Discriminative Activities
Discriminative Context understanding Does this step depend on understanding broader context for accurate Yes, No
Activities classification?
Judgement Does this step involve subjective assessments that might require human Yes, No, Human
judgement?
Reasoning/Decision- Does this step require logical analysis or context-based decision making? Yes, No, Human
making
Generative Activities
Generative Context understanding Does this step rely on capturing and maintaining context for coherent Yes, No
Activities outputs?
Deterministic outcomes Is the output of this step expected to be predictable and identical for Yes, No
the same set of inputs every time?
5. Results
Automation LLM 
Manual
Requirement Automation
Context 
Understanding Yes
No
Judgement 
Human Yes
Required
No
Reasoning / 
Decision 
Human Yes
Making 
Required
No
Semi-
Structured Semi-Structured,
Human Language Unstructured
Support
Structured
Formalization 
Yes
Required
No
Structured 
No
Data Input
Yes
Replicable 
Patterns / No
Boilerplates
Yes
Dynamic 
Interaction Yes
Required
No
Deterministic 
No
Outcomes
Yes
Traditional 
Scripting
Automation
Figure 5.5: Decision Tree Visualization
41
5. Results
Figure 5.5 is based on the interviews with the solution developers in order to under-
stand a part of their thought process when deciding on the automation method to
use. Even though this decision tree may help developers decide whether to choose
traditional scripting, AI, or even having a human in the loop, it can help highlight
the direct value of using AI on a lower level, where feasibility and effort are affected
by the specific step. Furthermore, it highlights the direct impact of using AI instead
of traditional automation, differentiating between the two. A decision tree in this
case was used as it can help with generalizing by making it reusable for different
cases and scenarios. Furthermore, the decision tree is flexible, allowing shuffling the
positions of the rank, in the case where a developer might prioritize the factors dif-
ferently based on preference, requirements, and capability. The way the tree works
is by first going down the first node which is context understanding, this node is
related to the three main categories: Formatting (Purple), Discriminative Activities
(Blue) and Generative Acitivities (Teal) each having a different description, hence,
it is marked with the three different colours. Then, going down the relevant path,
so in this case, if context understanding is needed in any of the three mentioned cat-
egories, the path would follow "yes," meaning that the LLM automation is needed,
highlighting that context understanding is a benefit of using AI for that specific
step.
5.5 Automation Levels and Risk
Having considered the different levels of automation from the DAnTE taxonomy, it
is clear that the level of automation of both projects is a level 4 "Global Generator"
[18]. This is due to the fact that automates workflows, including generation of
code, test suites, and documentation, with the inclusion of humans who review,
approve, and may correct/refine the generated output. The perceived benefit of
such automation, according to their research, is increased productivity, reduction of
manual efforts, and reduction of errors [18], which relates to the factors and issues
mentioned by the different stakeholders in the interviews and surveys. Following
Feldt et al’s taxonomy [10], the point of application varies depending on the scale of
what is considered the "application". By looking at the tools from the perspective
of the whole process, the point of application would be process level, which signifies
lower risk and negative impact. However, when only looking at the projects, the
point of application of AI would be on the product level, which would signify higher
risk levels. Additionally, considering the automation level of 4 on the DAnTE scale
[18], even higher risk levels and negative impacts may be present according to Feldt
et al.’s research [10]. Hence, safeguards may need to be considered.
42
6
Discussion
6.1 Addressing Research Questions
6.1.1 Research Question 1
RQ1: How is "impact" defined in a software development context?
Based on literature reviews and interviews with multiple stakeholders, the term
“impact” is defined as:
"The qualitative and quantitative changes and/or consequences across processes and
outcomes within the software development lifecycle. It constitutes the alteration or
creation of value in its different stakeholder-defined forms, as a direct or indirect
result of the integration and use of AI as a software development solution."
This definition highlights that impact does not only encompass measurable data but
also descriptive data. Furthermore, it highlights that impact can come in different
forms, such as direct and indirect. Moreover, having conducted interviews with the
stakeholders, a specific technical and business "impact" focus is possible based on
the available data and restrictions that are in place.
6.1.2 Research Question 2
RQ2: What impact metrics are applicable and how can these metrics be
categorized and prioritized?
In order to answer this question, conducting literature reviews is essential. By
doing so, it is possible to explore different metrics through investigating different
methodologies. Having conducted interviews to gain an understanding of what the
stakeholders want, it is possible to find the methodologies and metrics that will align
with them as well. Taking an iterative approach of exploring the methods, testing
them, and adjusting accordingly allows you to categorise the different metrics, rule
out those that are not feasible, and identify the important aspects relevant to the
stakeholders. There exists a great limitation involving these different cases in terms
of product maturity and company data accessibility. Hence, a structured approach
that considers the constraints of the study and the dimensions of the software pro-
cesses is required.
43
6. Discussion
The metrics are categorized into four main categories: Significance Weighting Met-
rics, Factor-Specific Metrics, Component-Specific Metrics, and Classification Met-
rics. Each plays a major role within their respective methods, and more importantly,
showcases how the AI solutions impact the software development processes. The or-
der in which these metrics are stated follows their prioritization. Starting with
the Significance Weighting Metrics, this approach enables the prioritization of dif-
ferent factors based on stakeholder-perceived significance, further highlighting the
strengths and weaknesses of the processes and facilitating the use of Factor-Specific
Metrics. Factor-Specific Metrics offer a way to quantify the breadth of the factors
related to the different quality categories (high/low quality and efficiency), as well
as to quantify those that are addressed by the AI solution. Similarly, Component-
Specific Metrics allow traceability to the specific components of the AI solution,
showcasing the contribution of the individual components and expanding the evalu-
ation of the AI solution’s effectiveness. Finally, the Classification Metrics provide a
quantitative approach to the qualitative attributes and information regarding how
automatic the AI solution is.
6.1.2.1 Significance Weighting Metrics
$100 Weights
Represents the relative importance of each factor as assigned by participants using
the $100 prioritization method. It is used to identify which factors are perceived as
most critical to quality and efficiency. This is done by the participants distributing
$100 across a set of factors within each category to reflect their perceived significance.
6.1.2.2 Factor-Specific Metrics
Total Number of Factors
Highlights the number of factors around high/low quality and efficiency, which allows
the user to gauge the maximum number of factors that can be targeted.
Number of Factors Per Quality Category
Highlights the number of factors within a specific quality category (i.e., High Quality,
Low Quality, High Efficiency, Low Efficiency), which allows the user to gauge the
maximum number of factors that can be targeted within their quality category.
Total Number of Targeted Factors
Highlights the number of factors the AI solution targets, essentially eliminating the
negative factors, or keeping and/or improving the positive factors around high/low
quality and efficiency.
Number of Targeted Factors Per Quality Category
Highlights the number of factors the AI solution targets within a specific quality cat-
egory (i.e., High Quality, Low Quality, High Efficiency, Low Efficiency), essentially
eliminating the negative factors, or keeping and/or improving the positive factors
within their quality category.
44
6. Discussion
Total Factor Coverage
Using the above-mentioned factor metrics, it is possible to calculate the percentage
of factors targeted by the AI solution. This is denoted as:
Factor Coverage = Total Number of Targeted FactorsTotal Number of Factors × 100%
Factor Coverage Per Quality Category
Using the above-mentioned factor metrics, it is possible to calculate the percentage
of factors targeted by the AI solution for each quality category. This is denoted as:
Factor Coverage Per Quality Category = Number of Targeted Factors Per Quality CategoryNumber of Factors Per Quality Category × 100%
6.1.2.3 Component-Specific Metrics
Component Contribution
Having each targeted factor mapped to its respective component, it is possible to
calculate the percentage contribution of each component to the total number of
targeted factors. This is defined as:
Component Contribution = Number of Factors Targeted by Componentii Total Number of Targeted Factors × 100%
6.1.2.4 Classification Metrics
Automation level
By using predefined taxonomies and classification frameworks such as DAnTE [18],
automation-level classification emerges. The main purpose of this metric is to pro-
vide quantifiable value to qualitative descriptions of software automation. This is
done by either understanding the capability of the tools or discussing the different
automation level descriptions with the solution developers to reach a consensus.
This directly connects with the understanding of direct impact, where automation
is seen as a positive outcome. Moreover, this automation level value clarifies the
role of both the human and the AI within the software development context.
6.1.3 Research Question 3
RQ3: What methodologies can be employed to measure the prioritized
impact metrics both quantitatively and qualitatively, and how can these
methodologies be applied to practical cases involving generative AI-driven
solutions within software development?
45
6. Discussion
In some cases, certain methods related to costs, effort, and time, or even results
pertaining to the study cases, would be employed here to answer the research ques-
tion. However, that’s not the case due to the company restrictions and the stake-
holder focus. A table including a general overview covering the context, description,
strengths, and weaknesses can be seen in Appendix A.1 for the unutilized methods
and in Appendix A.2 for the utilized methods. To answer this research question, a
more qualitative and quantitative approach is taken, focusing more on stakeholder-
defined priorities and process analysis.
6.1.3.1 Overview
Interviews and Surveys
Interviews and surveys serve as the main tools for gathering data, allowing the
capture of both quantitative and qualitative data. This helped design the structure
of the research, as well as understand the different processes and direct the aim
towards quality and efficiency, and the different factors related to them.
Pareto Analysis and $100 Method
By allowing stakeholders to allocate fictional $100 across the different factors, it
allowed quantification of the importance of the different factors and the possibility
to conduct a Pareto analysis. Here, the $100 weights are used to show the individual
perceived significance. Pareto Analysis highlights the important factors causing
most of the process issues as well as the ones that actually help the process, from
both the quality and efficiency perspectives. It also helps draw conclusions from the
survey and interview data, providing both quantitative and qualitative results by
offering a prioritized list of factors where each carries its own qualitative value while
also contributing quantitative data by showing their relative significance in terms of
frequency and impact. Conducting a Pareto Analysis allows the usage of the factor-
specific metrics to highlight which areas should be prioritized for AI intervention
and which components of the solution align most closely with the critical issues or
strengths identified.
SPII
Software Process Improvement Initiatives provide a mix of quantitative and quali-
tative value. It allows decomposing the major parts of the AI solutions into specific
components, allowing the traceability and mapping of each component as a source
of impact point. In this case, it is the specific factors the solution highlights or issues
it fixes/mitigates. In other cases, mapping the components to cost, time, effort, and
other feasible metrics would be possible. Furthermore, this enables the usage of
the component-specific metrics to take place, highlighting the relative contribution
of each component to the overall effectiveness of the AI solution, and enabling a
more detailed assessment of which parts deliver the most value in addressing spe-
cific process inefficiencies or quality improvements. This can help highlight which
components deliver the most value and justify development and investments.
46
6. Discussion
Automation Taxonomy and Risk
This helps classify the level of autonomy where automation in itself is considered
an impact, as well as classifying the point of application, which together allows
the developers and stakeholders to gain a high level of perceived risk that comes
with automation. By following the DAnTE framework[18], it is possible to use
the automation level classification metric, providing the quantitative value of how
automatic the software is, further providing qualitative information clarifying the
human and AI roles. Combined with the AI-SEAL[10] risk diagram as shown in
Figure 3.1, it can provide qualitative information related to the perceived risk of
automation.
6.1.3.2 Tying It All Together
As a collective, these methodologies answer RQ3 as they address it through com-
bining prioritization, structured process analysis, component-level traceability, and
a range of metrics to achieve both quantitative and qualitative evaluation of the im-
pact of AI in real-world software development processes. Together, this showcases
what methodologies can be employed to measure the impact metrics both quanti-
tatively and qualitatively, and how they can be applied to practical cases involving
generative AI solutions within software development contexts.
6.1.4 Research Question 4
RQ4: How can these impact metrics be modeled to address the needs of
target stakeholders and support their decision-making process when it
comes to the integration of generative AI in software development pro-
cesses?
To support stakeholder decision-making regarding the integration of AI in software
development, this study introduces a structured, yet flexible, approach to using and
exploring impact metrics and methods rather than following abstract and generic
metrics and methods. The model is built from the bottom up using literature
reviews, process-specific considerations, and practitioner insights, in both a quanti-
tative and qualitative way. The model follows the idea that impact does not have
a universal meaning but is defined by stakeholders, allowing the usage of context-
dependent factors instead of abstract terms like "effort saved".
The proposed five-phase modelling framework is a structured model that guides
practitioners with the analysis and interpretation of impact, as well as measuring
and modelling it. This framework can be seen in Section 4.3. The first phase allows
one to gain the base knowledge of how the processes work and the needs of the
stakeholders, highlighting the focus point of modeling the impact of AI. The second
phase focuses on understanding the issues that the stakeholders face within their
current processes, in this case, leading to the gathering and analysis of the high and
low quality and efficiency factors. Then the third phase focuses on breaking down the
proposed AI solution by communicating with the solution developers to understand
how the solution works and identifying the main components of the solution. Once
that is done, the fourth phase focuses on aligning the solution and problem by
47
6. Discussion
mapping the identified factors and the main components, as well as extracting the
untargeted factors and understanding the reason behind them. With that done,
the fifth phase focuses on analyzing the impact by assessing the outcomes as well as
evaluating the results of the solution in relation to its expected outcome, considering
whether the solution is developed or mature enough to achieve it. Moreover, during
this phase, the old and the new processes (with the solution) are compared through
relevant metrics and KPIs. By going through with this framework, in this case, the
main outcomes involve the Pareto Analysis, SPII, and Automation Tree.
6.1.4.1 Pareto Analysis and Diagrams
Conducting a Pareto Analysis allows understanding of what the emergent and most
prominent issues and factors are from the perspective of the stakeholders. This
allows the stakeholders and solution developers to direct the focus towards those
factors that are prioritized higher. Furthermore, it also follows the idea that the top
20% of the causes relate to 80% of the problem, while it may not be exactly 80%, it
is still a significant percentage of the problems. Moreover, in this study, conducting
a Pareto Analysis results in the creation of four Pareto Charts per process, each
targeting the following: High Efficiency, High Quality, Low Efficiency, and Low
Quality factors, simplifying and clarifying the results.
6.1.4.2 SPII
Breaking down the solution into major components allows for the use of the SPII,
where each component can be considered a process improvement, as they may work
as standalone solutions, providing a positive impact on the process. Having done
Pareto Analysis previously, the factors can be mapped to the different solution
components, resulting in diagrams like the ones shown in Figure 5.4. Furthermore,
it is possible to model other metrics with SPII, such as cost, effort, or quality-specific
metrics like defect density, as shown in the research conducted by Slaghter et al [28].
This can further guide stakeholders with decision-making, considering an AI-based
solution, when it comes to the stakeholder focus that was not captured in this study.
6.1.4.3 Automation Decision Tree
This allows us to capture the direct impact and intent of using AI at a lower level
by considering developers’ reasoning and decision-making when choosing traditional
scripting versus AI, as well as providing the qualitative value of AI. Furthermore,
this can help developers and stakeholders to decide when it is appropriate to use AI
or traditional scripting, or even to do the process manually. This is done through a
flexible decision tree as shown in Figure 5.5, as well as the supporting questions in
Table 5.14.
6.1.4.4 Tying It All Together
This study demonstrates how different metrics and methods can be combined to
form a model that can be contextualized and operationalized to support stakeholder
decision-making by applying the five-phase modeling framework with support from
48
6. Discussion
Pareto Analysis, SPII, and the Automation Decision Tree. This approach avoids re-
liance on abstract or generic measures and instead tailors it towards process-specific
issues and the AI solution components, providing traceability and ensuring relevance
as well as navigating around organizational restrictions. This also allows stakehold-
ers to identify which situations AI brings value to and can have a net benefit in, jus-
tify investments, and make structured, evidence-based decisions. Linking this study
to the broader software engineering lifecycle, the results from the case studies can be
generalized by focusing on the adaptability of the five-phase modeling framework,
which is not limited to specific tools or domains. Its emphasis on process-specific
metrics, stakeholder alignment, and decision support makes it applicable across var-
ious software development contexts where generative AI is being considered. While
the specific findings are context-dependent, the underlying approach offers a flexi-
ble structure that can be tailored to different teams, workflows, and organizational
settings within software engineering. Therefore, RQ4 is addressed by providing a
structured yet flexible model for aligning generative AI solutions with real-world
software development context and needs.
6.2 Future Works
6.2.1 Result Compilation and Pre and Post Comparison
Within the final phase of the proposed framework, the result compilation and pre-
post comparisons are labeled as "if applicable." This is due to the fact that certain
factors or requirements pertaining to maturity levels and development of the tool
need to be fulfilled in order to conduct them, as this was not the case in the studied
cases. Further research needs to be conducted to investigate and explore the causal
impact of the AI tools in practice. This would involve systematic collection of pre-
and post-implementation data around the idea of KPI’s and actual results of the
tool, such as time, software quality improvements, and in this case, testing specific
metrics. In this case, data collection may take place both prior to the integration
of the tool and shortly after its implementation, as well as once stakeholders have
become more familiar and comfortable with the new process.
6.2.2 Constraints, Limitations and Risk
The framework contains a step in the third phase regarding assessing the constraints,
limitations, and risks of implementing AI, which requires a more detailed assessment.
Constraints related to data availability, model explainability, organizational readi-
ness, and compliance (especially in safety-critical domains) need to be more formally
focused on and evaluated.
6.2.3 Automatability Levels of the framework
The current use of low, medium, and high automatability levels for classifying the
automation level of the framework steps and phases, subjectively, provides a qual-
itative estimate. Future research in this case should focus on operationalizing the
49
6. Discussion
categories using a more established or a defined and measurable criterion. This
would standardize the assessment across the different contexts and enable the op-
portunity to model automation in a feasible manner.
6.2.4 Longitudinal Studies and Feedback Loops
To expand on this research, a longitudinal study would enable capturing the long-
term impact of AI solutions. This can be done through tracking the adoption,
performance, and stakeholders’ feedback over a period of time. This will not only
help refine the framework and the modeling process but may also identify other
emerging values, risks, and consequences not captured in this study.
6.2.5 Expanding to other domains and focuses
As this study mainly focuses on, and is validated through testing processes within
an automotive context, specifically control system testing and API testing at Volvo
Trucks, future research can explore other domains, aspects, and organizations of
varying sizes to capture different perspectives and further validate or adapt the
framework, enabling the discovery of new findings.
50
7
Validity Threats and Limitations
7.1 Internal Validity
7.1.1 Sampling Bias
Low Sample Size
One of the biggest internal validity threats is the low sample size, where in some
cases, there were very low response rates. An example of this can be seen in Round
3 of interviews and surveys, where only 2 people responded out of 30. This may
introduce nonresponse bias, where the results may have been completely different if
more people had responded. This may affect the reliability and representativeness of
the data. The low participation may have been influenced by resistance to change,
skepticism toward AI integration, or a lack of perceived value in the study among
potential respondents. It may also have resulted from practical constraints, such as
limited availability or a lack of motivation to engage with interviews or surveys.
Low Sample Diversity
The intention was to interview and sample different roles within the stakeholder
team in each round. However, that was not the case, where in Round 3, only testers
responded to the surveys, while the surveys and interview requests were sent out to
roles ranging from product owners to managers. This may introduce bias or skew
the results in favor of the tester’s perspective, which may be more technical and
operationally oriented rather than business or managerially oriented, which may
affect the representativeness of the different roles.
7.1.2 Research Bias
There is a risk of research bias, and that is due to the fact that some of the solu-
tion developers participated in the surveys and interviews while helping define the
framework through their responses, leading to the creation of the automatability
decision tree. This may skew the outcome of the research as the participants may
have had an interest or alignment with the success of AI solutions unconsciously.
This also breaks the line between data collection and design.
51
7. Validity Threats and Limitations
7.1.3 Social Desirability
There is also a possibility of social desirability bias in how stakeholders responded
to questions about inefficiencies in manual processes. In some cases, participants
may have underreported challenges or inefficiencies to avoid reflecting negatively
on colleagues or team practices. Additionally, given the study’s focus on AI-driven
solutions, there may have been a fear of job displacement, leading some stakeholders
to consciously or unconsciously withhold or downplay information that could sup-
port automation. This can limit the depth and honesty of feedback, particularly in
areas where AI could be perceived as a threat to existing roles.
7.2 External Validity
Since this study mainly focuses on one company and a specific domain of automotive
software testing, this may limit the generalizability of the findings. Even though the
study attempts to create a flexible and generalizable framework, the foundations of
the framework originate from internal processes and tools at said company, possibly
making the findings context-specific and requiring additional steps to make it more
adaptable. Moreover, the AI solutions are still under development, and with a lack
of end-user feedback, certain conclusions on impact are partially speculative.
7.3 Limitations
Due to the period during which the research was conducted in relation to the matu-
rity and the state of the tools, it was not possible to perform the longitudinal mea-
sures related to the pre/post-comparison and the result compilation from the last
steps in phase 5 of the framework. Moreover, it would have been possible to collect
some project-specific metrics. However, it would not have provided much valuable
information because they were constantly improving and having some architectural
changes, as well as the fact that there were many restrictions and limitations in-
volving collecting data from the stakeholders that could be used for comparison.
Furthermore, due to constraints regarding company data, certain planned metrics
collection activities were not feasible to conduct, such as metrics related to cost,
and even if those restrictions were not there, it would still be a very complicated
and non-trivial process to do. Hence, not going through with methods such as the
return on software quality.
52
8
Conclusion
With a focus on software development processes, specifically software testing, in
the automotive industry at Volvo Trucks, this study aimed to propose and explore a
structured, stakeholder-centric framework for evaluating the impact of AI-driven so-
lutions. By incorporating qualitative and quantitative methods and techniques such
as Pareto analysis, Software Process Improvement Initiatives, and Automation de-
cision trees alongside Interviews and Surveys, this research provides a methodology
to model the impact of AI on software development processes.
The development of this five-phase framework enables practitioners to understand
existing processes and workflows, identify and prioritize problems, comprehend the
proposed AI-based solution, and map the solution to the problem, allowing for the
analysis and measurement of the impact of AI within these processes. Further-
more, this study defines the term "impact", emphasizing that there is no universal
definition, but it is stakeholder-centric. This research not only ensures theoretical
relevance but also practical usefulness through the application and investigation of
real-world industrial cases.
With a focus on identifying and prioritizing quality and efficiency factors, aligning
them with the main components of the AI tool, and assessing the external limita-
tions, the results show that having a structured AI solution aligning with stake-
holder needs can deliver value through measurable improvements, alleviating the
major pain points, while preserving the strengths of the current processes.
In conclusion, this study lays the foundation and opens the doors for more data-
driven and context-related evaluation of AI impact in software development and
invites future work to refine, extend, and validate the proposed framework in differ-
ent industries.
53
8. Conclusion
54
Bibliography
[1] Alain Abran and Pierre N. Robillard. Function points analysis: an empirical
study of its measurement processes. IEEE Transactions on Software Engineer-
ing, 22(12):895–910, 1996.
[2] Aybüke Aurum, S Biffl, B Boehm, H Erdogmus, and P Grünbacher. Value-based
software engineering. Springer, 2005.
[3] Marco Barenkamp, Jonas Rebstadt, and Oliver Thomas. Applications of ai in
classical software engineering. AI Perspectives, 2(1):1, 2020.
[4] Jaiprakash Bhamu, JV Shailendra Kumar, and Kuldip Singh Sangwan. Produc-
tivity and quality improvement through value stream mapping: a case study of
indian automotive industry. International Journal of Productivity and Quality
Management, 10(3):288–306, 2012.
[5] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde
De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph,
Greg Brockman, et al. Evaluating large language models trained on code. arXiv
preprint arXiv:2107.03374, 2021.
[6] Christophe Commeyne, Alain Abran, and Rachida Djouab. Effort estimation
with story points and cosmic function points-an industry case study. Software
Measurement News, 21(1):25–36, 2016.
[7] COSMIC - Common Software Measurement International Consortium. Cosmic
sizing methodology website. Accessed April 10, 2025.
[8] John W Creswell and J David Creswell. Research design: Qualitative, quanti-
tative, and mixed methods approaches. Sage publications, 2017.
[9] RD Emrick. In search of a better metric for measuring productivity of applica-
tion development. In Proceedings of Function Point Users Group Conference,
1987.
[10] Robert Feldt, Francisco G de Oliveira Neto, and Richard Torkar. Ways of ap-
plying artificial intelligence in software engineering. In Proceedings of the 6th
International Workshop on Realizing Artificial Intelligence Synergies in Soft-
55
Bibliography
ware Engineering, pages 35–41, 2018.
[11] Stefan Feuerriegel, Jochen Hartmann, Christian Janiesch, and Patrick Zschech.
Generative ai. Business & Information Systems Engineering, 66(1):111–126,
2024.
[12] Fiona Fui-Hoon Nah, Ruilin Zheng, Jingyuan Cai, Keng Siau, and Langtao
Chen. Generative ai and chatgpt: Applications, challenges, and ai-human col-
laboration, 2023.
[13] Brian P Gallagher. Interpreting capability maturity model integration (cmmi)
for operational organizations. 2002.
[14] Jean Hartley. What is a case study. Essential guide to qualitative methods in
organizational research, 323, 2004.
[15] Alexej Kisselev. Alexej kisselev, jan 2023. Alexej Kisselev - Homepage.
[16] Jostein Langstrand. An introduction to value stream mapping and analysis.
2016.
[17] Jack E Matson, Bruce E Barrett, and Joseph M Mellichamp. Software devel-
opment cost estimation using function points. IEEE Transactions on Software
Engineering, 20(4):275–287, 1994.
[18] Jorge Melegati and Eduardo Guerra. Dante: a taxonomy for the automation
degree of software engineering tasks. In Generative AI for Effective Software
Development, pages 53–70. Springer, 2024.
[19] Dhasarathy Parthasarathy. Journeys in vector space: Using deep neural net-
work representations to aid automotive software engineering. Doctoral thesis,
Chalmers University of Technology and University of Gothenburg, 2023.
[20] Dhasarathy Parthasarathy, Yinan Yu, and Earl T Barr. Polymer: Development
workflows as software. arXiv preprint arXiv:2503.17679, 2025.
[21] Mark C Paulk, Bill Curtis, Mary Beth Chrissis, and Charles V Weber. Capa-
bility maturity model, version 1.1. IEEE software, 10(4):18–27, 1993.
[22] Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. The impact
of ai on developer productivity: Evidence from github copilot. arXiv preprint
arXiv:2302.06590, 2023.
[23] Ameya Shastri Pothukuchi, Lakshmi Vasuda Kota, and Vinay Mallikarjunarad-
hya. Impact of generative ai on the software development lifecycle (sdlc). In-
ternational Journal of Creative Research Thoughts, 11(8), 2023.
[24] Taman Powell and Tanya Sammut-Bonnici. Pareto analysis. 2014.
56
Bibliography
[25] Mike Rother and John Shook. Learning to see: value stream mapping to add
value and eliminate muda. Lean enterprise institute, 2003.
[26] Jaakko Sauvola, Sasu Tarkoma, Mika Klemettinen, Jukka Riekki, and David
Doermann. Future of software development with generative ai. Automated
Software Engineering, 31(1):26, 2024.
[27] Alex Singla, Alexander Sukharevsky, Lareina Yee, Michael Chui, and Bryce
Hall. The state of ai in early 2024: Gen ai adoption spikes and starts to generate
value. Technical report, McKinsey & Company, 2024. Accessed March 10, 2025.
[28] Sandra A Slaughter, Donald E Harter, and Mayuram S Krishnan. Evaluating
the cost of software quality. Communications of the ACM, 41(8):67–73, 1998.
[29] Rini Van Solingen. Measuring the roi of software process improvement. IEEE
software, 21(3):32–38, 2004.
[30] Shuai Wang, Yinan Yu, Robert Feldt, and Dhasarathy Parthasarathy. Au-
tomating a complete software test process using llms: An automotive case
study. arXiv preprint arXiv:2502.04008, 2025.
[31] Run Xu. The relationship between quality and efficiency in business manage-
ment. Macro Management & Public Policies, 2(3), 2020.
57
Bibliography
58
A
Appendix
A.1 Non-utilized Methods
I
A. Appendix
II
Table A.1: Full non-utilized methods classification
Name of Tech Context / Purpose Description Strengths Weaknesses
FPA (Function Estimate project size Measures software size by evalu- - Can be used early in - Complex and time-
Point Analysis) for planning efforts, ating the functional requirements project lifecycle consuming
cost, and time pre- from a user perspective regard- - Well-established method - Subjective weighting
development. less of technology or language. - Structured effort estima- - Requires training
tion
COSMIC Estimate project size Focuses on software’s data - Better suited for modern - Less common than FPA
Function for planning efforts, movements (Entry, Exit, Read, AI and embedded systems method
Points cost, and time pre- Write) to calculate size, ideal for - Precise measurement of - Functional user require-
development. embedded/real-time systems like functional user require- ments needed
automotive platforms. ments
ROSQ (Return Calculate return on Quantifying the business value - Helps justify investment - Difficult to isolate qual-
on Software investment and quan- gained from investing in software of quality ity impact and other fac-
Quality) tifying quality. quality improvements. - Enables ROI-based tors
decision-making - Requires internal com-
pany information
- Requires mature systems
CMMI / Evaluate software Frameworks for assessing and - Systematic SPI evalua- - Resource-intensive
CMM (Capa- Process Improvement improving organizational process tion - Bureaucratic and time-
bility Maturity evaluation. maturity, especially in software - Recognized globally consuming
Model Integra- and systems engineering. - High initial adoption
tion / Model) barrier
Continued on next page
A. Appendix
III
Table A.1 continued from previous page
Name of Tech Context / Purpose Description Strengths Weaknesses
VSM (Value Identify waste, bottle- A lean tool that visually maps - Effectively highlights - Best suited for linear or
Stream Map- necks, and improve- the steps in a process to identify inefficiencies and delays sequential processes
ping) ment opportunities. inefficiencies and waste-creating - Visualizes value flow and - Less effective for com-
Applied to assess AI activities. Used to understand handoffs plex, parallel workflows
integration’s value and optimize end-to-end work- - Supports continuous like those in CS/API
vs. effort, cost, and flows. improvement initiatives Testing
investment. - Time-intensive to create
and maintain
A. Appendix
A.2 Utilized Methods
IV
A. Appendix
V
Table A.2: Full utilized methods classification
Name of Tech Context / Purpose Description Strengths Weaknesses
Pareto Analy- Identify and priori- Uses the 80/20 rule to highlight - Simple, visual, highlights - May oversimplify
sis tize key quality and the small number of factors caus- high-impact areas - Doesn’t suggest solu-
efficiency issues in ing the majority of problems. - Helps focus efforts tions
software processes. Ranks stakeholder-reported is- - Effective in early-stage - Quality depends on in-
Determines where sues. analysis. put
AI should focus its - Can miss rare but criti-
efforts and whether it cal issues
delivers impact.
Software Pro- Evaluate whether AI Structured approach for assess- - Business-aligned - Requires baseline data
cess Improve- components result in ing and guiding improvements - Flexible, - Hard to isolate effects
ment Initiative measurable improve- to software processes based on - Supports continuous - Can become complex
(SPII) ments in time, cost, or changes introduced (e.g., AI). improvement - Depends on consistent
quality. Links tech- - Ties actions to measur- follow-up
nical changes to busi- able outcomes
ness outcomes.
Automation Classify AI automa- Organizes tasks from manual - Structured risk aware- - Generic
Taxonomy and tion levels and associ- to autonomous across appli- ness - Lacks detailed risk types
Risk ated risk to determine cation points (process/produc- - Quantifies automation - Needs interpretation
suitability in a given t/runtime), mapping them to levels - Limited predictive depth
software process. risk levels. - Fosters cross-functional
discussion
Continued on next page
A. Appendix
VI
Table A.2 continued from previous page
Name of Tech Context / Purpose Description Strengths Weaknesses
Automation Guide decision- Developer-informed logic tree - Simple, practical and - May oversimplify
Decision Tree making between AI, that outlines criteria (e.g. struc- consistent - Doesn’t handle all edge
scripting, or human ture, repeatability) for selecting - Captures expert knowl- cases
effort for specific an automation method. edge - Ignores broader con-
automation steps. - Team-adaptable straints
- Not always predictive
A. Appendix
A.3 Round 2 interview/survey questions
• Briefly describe and list the tasks and/or main sources of high efficiency in
your/your team’s workflow?
• Briefly describe and list the tasks and/or main sources of low efficiency in
your/your team’s workflow?
• Briefly describe and list the tasks and/or main sources of high quality in
your/your team’s workflow?
• Briefly describe and list the tasks and/or main sources of low quality in your/y-
our team’s workflow?
• What is your role and what is your experience working with component-level
testing in years?
VII