Development of a Lagrangean Dual Method for Solving Sequentially Connected 2D Assignment Problems Modified deflected subgradient optimization and ergodic sequences Master’s project report in Applied Mathematics Louise Chastine Christensen DEPARTMENT OF MATHEMATICAL SCIENCES UNIVERSITY OF GOTHENBURG Gothenburg, Sweden 2025 www.gu.se Master project report 2025 Development of a Lagrangean Dual Method for Solving Sequentially Connected 2D Assignment Problems Modified deflected subgradient optimization and ergodic sequences Louise Chastine Christensen Department of Mathematical Sciences University of Gothenburg Gothenburg, Sweden 2025 Development of a Lagrangean Dual Method for Solving Sequentially Connected 2D Assignment Problems Modified deflected subgradient optimization and ergodic sequences Louise Chastine Christensen © Louise Chastine Christensen, 2025. Supervisor: Prof. Ann-Brith Strömberg, Department of Mathematical Sciences Examiner: Ass. Prof. Axel Ringh, Department of Mathematical Sciences Master project report 2025 Department of Mathematical Sciences University of Gothenburg SE-412 96 Gothenburg Sweden Telephone +46 31 772 1000 Typeset in LATEX, template by Kyriaki Antoniadou-Plytaria Gothenburg, Sweden 2025 iv Development of Lagrangian Dual Methods for Solving Sequentially Connected 2D Assignment Problems Modified deflected subgradient optimization and ergodic sequences Louise Chastine Christensen Department of Mathematical Sciences University of Gothenburg Abstract This thesis addresses multi-object tracking (MOT) problems and studies efficient methods for evaluating the Trajectory Generalized Optimal Sub-Pattern Assign- ment (TGOSPA) metric. The primary objective is to develop a Lagrangian dual optimization method, that incorporates modified deflected sub-gradients and an en- hanced ergodic sequence of subproblem solutions, and lastly constructing a feasible heuristic solution to the MOT problem. The proposed algorithm is evaluated on se- quentially connected assignment problems of varying dimensions and tested against existing solution algorithms. While the algorithm demonstrates some potential, it exhibits sensitivity to parameter choices and suffers from slow convergence. Further refinement and development is therefore encouraged. Keywords: Multi-object tracking. Generalized optimal sub-pattern assignment met- ric. Subgradient optimization method. Modified deflected subgradient search direc- tion. Lagrangean relaxation. Enhanced ergodic sequences. v Acknowledgements I would like to thank my supervisor Ann-Brith Strömberg for all the help, discussions and great inputs throughout this process. Thank you to Johan and Severin for endless support and motivation. Louise Chastine Christensen, Gothenburg, June 2025 vii List of Acronyms Below is the list of acronyms that have been used throughout this thesis listed in alphabetical order: ADMM Alternating direction method of multipliers [2] ALD Alternative Lagrangean Dual method [2] GOSPA Generalized optimal sub-pattern assignment metric ILP Integer Linear programming JVC Jonker-Volgenant-Castanon LP Linear programming MDS Modified deflected subgradient method MILP Mixed-Integer Linear Programming MOT Multi-object tracking REG1 Regularization algorithm 1 for comparison [13] REG2 Regularization algorithm 2 for comparison [13] ix Nomenclature Below is the nomenclature of indices, sets, parameters, and variables that have been used throughout this thesis. Indices i Indices for ground truths j Indices for estimates k Indices for time steps t Current iteration of subgradient method Sets V Feasible solution set for relaxation of primal problem Parameters cp Cut-off parameter for localization error n Number of ground truths m Number of estimates p Penalty coefficient for distance function Terg Initialization iteration of the ergodic sequence γ Penalty parameter for track switches τt Step length at iteration t αkij Lagrangean multiplier for constraint (1.3d) βkij Lagrangean multiplier for constraint (1.3e) σkj Lagrangean multiplier for constraint (1.3b) λkij Subgradient connected to Lagrangean multiplier αkij xi ϕkj Subgradient connected to Lagrangean multiplier σkj θ Step-length parameter in the Polyak step length rule ΨMDS Deflection parameter in modified deflected subgradient method dt Modified deflected search direction at iteration t d̃t Projected modified deflected search direction at iteration t ϵ̃ Strict maximum allowed optimality gap Variables wkij Assignment element of ground truth i and estimate j in assignment matrix wk at time step k hkij Variables in penalty term in primal problem for ground truth i, estimate j at time step k {x̃t} Ergodic sequence of weighted subproblem solutions x̄ Feasible heuristic solution from ergodic sequence xii Contents List of Acronyms ix Nomenclature xi 1 Introduction 1 1.1 Project presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Multi-object tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Generalized optimal sub-pattern assignment metric . . . . . . . . . . 2 1.3.1 Problem formulation under the GOSPA metric over the set of trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3.2 Existing solution algorithms . . . . . . . . . . . . . . . . . . . 4 1.3.3 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.4 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Theory 7 2.1 Integer programming . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Lagrangean relaxation and duality . . . . . . . . . . . . . . . . . . . . 8 2.3 Subgradient optimization method . . . . . . . . . . . . . . . . . . . . 8 2.3.1 Modified deflected subgradient method . . . . . . . . . . . . . 9 2.4 Enhanced ergodic sequence . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Development of a Lagrangean dual algorithm 13 3.1 Lagrangean duality and relaxation . . . . . . . . . . . . . . . . . . . . 13 3.1.1 Subgradient and sub-differential of the Lagrangean dual function 15 3.2 Implementation of modified deflected subgradient method . . . . . . . 16 3.2.1 Initial heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Implementation of enhanced ergodic sequence . . . . . . . . . . . . . 17 3.3.1 Heuristic solution from the enhanced ergodic sequence . . . . 18 3.3.2 Algorithm scheme . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Implementation and Results 21 4.1 Initial choice of parameters for the 18 data sets . . . . . . . . . . . . 21 4.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.1 Choice of parameter values . . . . . . . . . . . . . . . . . . . . 25 4.3 Relaxation strategy and choice of step length parameter θ . . . . . . 27 4.3.1 Comparison with an alternative Lagrangean Dual Method and Linear programming relaxation . . . . . . . . . . . . . . . . . 29 xiii Contents 4.3.2 Comparison with regularization algorithms . . . . . . . . . . . 30 4.3.2.1 ADMM . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3.2.2 REG1 and REG2 . . . . . . . . . . . . . . . . . . . . 31 5 Discussion and Conclusion 35 5.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2.1 Vogel’s approximation method . . . . . . . . . . . . . . . . . . 36 5.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.4 In conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Bibliography 39 A Appendix I A.1 Vogel’s approximation method . . . . . . . . . . . . . . . . . . . . . . I A.2 Python code existing algorithms . . . . . . . . . . . . . . . . . . . . . II xiv 1 Introduction 1.1 Project presentation The purpose of this project is to formulate and implement a new algorithm for eval- uating multi-object tracking problems, using the Trajectory Generalized Optimal Subpattern Assignment (TGOSPA) metric. The objective is to develop an opti- mization method that combines Lagrangian duality theory with enhanced ergodic sequences. Previous master’s theses have proposed different methods for solving multi-object tracking problems efficiently under the TGOSPA metric, including ap- proaches based on Lagrangean relaxation and different regularization methods. In this paper we develop a method that extends the relaxation scheme, and ideally leads to Lagrangean semi-assignment subproblems that are individually computationally tractable to solve. Since our approach involves extensive relaxations, the resulting solution will be infeasible. To address this, we will initiate enhanced ergodic se- quence and a heuristic to obtain a good feasible solution. The proposed method will be tested on data used in [2] and on data generated using code developed in [13]. We evaluate our method through a comparison with previously proposed methods presented in the two master’s theses [2] and [13]. 1.2 Multi-object tracking The task of tracking multiple objects over time is known as Multi-Object Tracking (MOT). The objective of MOT is to locate and track multiple objects over a series of input frames while maintaining their identities [12]. This type of tracking can be viewed as an estimation problem, where the goal is to track and identify multiple objects over a series of times steps. Throughout these time steps objects can be born, move, and die. Since MOT problems are often very complex and computationally hard to solve, the development of algorithms which are capable of solving these problems efficiently is of interest. Performance measures in the development of new methods for solving MOT issues are important as we wish not only to allocate and identify objects correctly, but also to do this within reasonable time. For this purpose [19] presents the TGOSPA metric between sets of trajectories which can be used for evaluation of performance, as the metric provides a distance function that quantifies missed and false target errors and penalizes localization errors. 1 1. Introduction 1.3 Generalized optimal sub-pattern assignment metric The GOSPA metric is a metric introduced by [15] on the space of finite targets, with the objective to penalize localization errors and missed/false targets. The GOSPA metric is extended in [19] to the set of trajectories, by considering the target states of the trajectories over time steps k. This TGOSPA metric can be used for evaluating multi object tracking algorithms. It enables evaluation through a comparison of trajectory estimates with ground truth trajectories, where the ground truth represents the true trajectory of an object, using the distance and penalties for optimal assignment. 1.3.1 Problem formulation under the GOSPA metric over the set of trajectories In this section we will present the GOSPA metric between two sets of trajectories X = {xki |i ∈ {1, . . . , n}, k ∈ {1, . . . , K}} and Y = {ykj |j ∈ {1, . . . ,m}, k ∈ {1, . . . , K}}, where X contains the positions xki of ground truths i ∈ {1, . . . , n} at time k. More- over, the position for each estimate j ∈ {1, . . . ,m} is represented by ykj for time step k. The objective is to assign the correct ground truth xk to the correct estimate yki j and vice versa for all time steps k ∈ {1, . . . , K}. If we fail to do so, an objective function is penalized depending on the assignment error. There are three types of penalties: missed targets, false targets and track switches. If ground truth xki exists at time k, and it is left unassigned, it is interpreted as a missed target. Equiva- lently, if estimate ykj exists at time k, and it is left unassigned, it is interpreted as a false target. Track switches refers to the change in assignment of ground truth i or estimate j between time steps k and k + 1. This implies in example that object i is identified as two different objects from time step k and k + 1. Missed and false targets are evaluated in a distance matrix. The distance matrix is constructed from the set of ground truths X , and the set of estimate Y . If both positions xki and ykj exist at time k, the corresponding element in the distance matrix, dkij is defined by the euclidean norm as the localization error if ∥xk − yk∥p < cp. If ∥xk − yk pi j p i j ∥p ≥ cp, dk pij = c , which is the maximum allow localization error. Here p denotes the expo- nent. Later in the thesis, we restrict our attention to the case p = 1. If a position xk k ki and yj does not exist at time step k then dij = 0. This means that the cost of assigning a non-existing position to another empty position is 0. If one position is empty, and the other exists, the distance dkij is given by the penalty c p 2 . Such an assignment is interpreted as either a missed target if xki ̸= ∅ and ykj = ∅, or as false target is xki = ∅ and ykj ̸= ∅. This will be assigned in what we will call the dummy labels i = n + 1 and j = m + 1. Hence the distance matrix will have dimension (n+ 1)× (m+ 1)×K. A distance matrix element is thus constructed as 2 1. Introduction  min{cp, ∥xk − yk pi j ∥p} if xki ̸= ∅ , ykj ≠ ∅, dkij =  0 if xki = ∅ , ykj = ∅, (1.1)cp 2 otherwise. In order to formulate an integer linear program we will use assignment matrices. The binary assignment matrix wk consist of element wkij ∈ {0, 1}. We set wkij = 1 if (i, j) is paired at time point k, and wkij = 0 otherwise. We only allow exactly one assignment for each ground truth, and vice versa, but we allow more than one assignment to the dummy labels n+ 1 and m+ 1, in which case we say that either the ground truth or the estimate is unassigned. The dummy variables cannot be assigned to one another, thus wkn+1,m+1 = 0 must hold. Changes in the assignment of ground truths and estimates, such that wk ̸= wk+1 between time steps k and k + 1 are called track switches. These are penalized directly in the objective function by a switching penalty γp2 . In the ideal setting we will obtain the same solution wk at all times k ∈ {1, . . . , K }. The objective is to minimize the sum of all distances and track switches between the assigned pairs (i, j). We formulate the problem as ∑K n∑+1m∑+1∗ γp K∑−1∑n ∑m ∣:= min k k + ∣∣ ∣z d w wk k+1∣ wk ∀ ij ij 2 ij − wij ∣ (1.2a), k k=1 i=1 j=∑1 k=1 i=1 j=1n+1 subject to wkij = 1, j = 1, . . . ,m, k = 1, . . . , K, (1.2b) m∑i=1+1 wkij = 1, i = 1, . . . , n, k = 1, . . . , K, (1.2c) j=1 wkn+1,m+1 = 0, k = 1, . . . , K, (1.2d) wkij ∈ {0, 1}, i = 1, . . . , n+1, j = 1, . . . ,m+1, (1.2e) k = 1, . . . , K, with (1.2b) and (1.2c) being the assignment constraints on wk. As the problem (1.2a) is not linear due to the absolute value function, we introduce auxiliary vari- ables hkij. The absolute value is expressed through the two inequality constraints (1.3d) and (1.3e) on hkij. We rewrite our problem as 3 1. Introduction K n+1m+1 p K−1 ∗ ∑∑ ∑ γ ∑∑n ∑mz = min dkijwk + hk , (1.3a)wk,hk∀ ij ijk k=1∑i=1 j=1 2 k=1 i=1 j=1 n+1 s.t. wkij = 1, j = 1, . . . ,m, k = 1, . . . , K, (1.3b) i=1 m∑+1 wkij = 1, i = 1, . . . , n, k = 1, . . . , K, (1.3c) j=1 hk kij ≥ wij − wk+1ij , i = 1, . . . , n, j = 1, . . . ,m, k = 1, . . . , K−1, (1.3d) hkij ≥ wk+1 kij − wij, i = 1, . . . , n, j = 1, . . . ,m, k = 1, . . . , K−1, (1.3e) wkn+1,m+1 = 0, k = 1, . . . , K, (1.3f) wkij ∈ {0, 1}, i = 1, . . . , n+1, j = 1, . . . ,m+1, (1.3g) k = 1, . . . , K. 1.3.2 Existing solution algorithms For comparison we consider four different existing methods: An alternative La- grangean dual method, (ALD), the alternating direction method of multipliers, (ADMM), both presented in [2], and two algorithms REG1. and REG2. presented in [13] using entropic optimal transport. See [2, 13] for description and formulation of the algorithms. 1.3.3 Data sets After development of the Lagrangean dual method, it is tested on different types of data in order to investigate the performance. We test the performance on 18 data sets provided to [2]. These data sets are interesting as each data set reflects real data over a varying number of trajectories, both ground truths i and estimates j. These data sets will be used in the comparison with the ALD and ADMM methods from [2]. For a comparison with REG1 and REG2 we use generated data constructed by [13]. We will use structured data. The structured data generates ground truths which are manipulated by track-switches, false trajectories, and added noise in order to generate dummy estimates. For a thorough description of the construction of structured data, see [13]. Lastly we will test the method on data sets with a large difference in dimensions n, m and K. 1.3.4 Code The implementation is done in Python on a computer with a processor Intel(R) Core(TM) i5-10210U CPU @ 1.60 GHz. The code can be extracted by the follow- ing link: https://github.com/LouiseChristensen/MMA920-Lagr.method- MOT.git. The times displayed therefor differ from the times presented in [13] 4 1. Introduction and [2]. All times displayed are clocked using the time module in python with time.monotonic(). In our implementation we will use the initial heuristic suggested by [2]. We have made a minor modification to this code for performance, but besides this it is identical to the original code from [2]. The code for data generation and the LP-solver is provided by [13]. For links to Github see Appendix A.2. 5 1. Introduction 6 2 Theory In this chapter we will introduce the theory needed for the development of our algo- rithm. We will present theory regarding integer programming, integrality property and Lagrangean relaxation along with a presentation of the modified deflected sub- gradient method. Lastly the chapter will also introduce enhanced ergodic sequences. 2.1 Integer programming A linear program, LP, is called an integer linear program, ILP, if the variables are required to be integer or binary. If this requirement is only valid for some variables, the problem is called a mixed integer linear program, MILP, or a mixed binary linear problem, MBLP. Consider the general ILP problem to z∗ := minimum cTx, subject to Ax ≥ b, (2.1) x ∈ S, where S ⊆ Zn+, c ∈ Rn, b ∈ Rm, and A ∈ Rm×n. ILP problems are non-convex and do not have optimality conditions as regular LP problems [11]. As a result ILP’s can be computationally hard to solve. One way to get around this is to relax one or more constraints in (2.1), for instance a relaxation of the integer requirement. A relaxation of the ILP will result in a lower bound on the optimal value z∗LP ≤ z∗. Integrality property The integrality property is a condition which ensures that the solution to an LP of a relaxed ILP is guaranteed to be integral. If the constraint matrix A is totally uni-modular, that is all square sub-matrices have determinant 0 or ±1, and the column vector b is integer, then the problem is said to have the integrality property [4]. If an IP has the integrality property, it can be solved as a linear program, while the solution remains integer. The feasible region of the LP problem is hence a polyhedron with integer extreme points. As the optimal solution will be found in one of these extreme points, the variables in the LP solution will take integer values. This implies that z∗ = z∗LP [4]. This property is relevant in this project, as the problem to a large extent possesses this property, but due to the constraints (1.3e) and (1.3d) fractional solutions sometimes exists. 7 2. Theory 2.2 Lagrangean relaxation and duality Lagrangean relaxation as described in [6] is a widely used relaxation method. Through a relaxation of one or more constraints in the primal problem, Lagrangean duality divides the problem into smaller separable sub-problems, which are computation- ally easier to solve. Lagrangean relaxation of the ILP problem (2.1) results in the Lagrangian function L : Rn × Rm →7 R L(x, u) = cTx+ uT (b− Ax) (2.2) where the vector u ∈ Rm is called a Lagrangean multiplier. The minimization of the Lagrangian function with respect to x ∈ S in (2.1), determines the Lagrangean dual function h : Rm 7→ R, h(u) := min{L(x, u) } (2.3)x∈S = min cTx+{uT (b− Ax). } (2.4)x∈S = bTu+min (c− ATu)Tx . (2.5) x∈S The dual function is to be maximized over u ≥ 0 to obtain the dual problem max h(u). (2.6) u≥0 The dual function is a piecewise affine and concave function on Rm [6]. Each feasible solution of the dual problem provides an optimistic bound on the optimal primal objective value z∗ by weak duality. That is, if x and u are optimal in (2.1) and (2.6) then it holds that h∗ ≤ z∗. The Lagrangean relaxation might therefore result in a nonzero duality gap, i.e. z∗ − h∗ > 0, between the primal and dual optimal value if the integrality property is not present. 2.3 Subgradient optimization method The use of iterative subgradient algorithms for solving optimization problems is well known. The subgradient algorithm utilize, as the name implies, the subgradient of the given function in an iterative scheme. The subgradient direction of each new iteration will point into the half-space containing the optimal set. Subgradient methods are often used for solving Lagrangian dual problems. The sub-differential of any concave function{h at a p∣oint u ∈ R m is defined as } ∂h( ∣u) = g ∈ Rm∣h(v) ≤ h(u) + gT (v − u), v ∈ Rm , (2.7) which is a closed and convex set, containing all sub-gradients of h at u ∈ Rm. Subgradient g ∈ ∂h(u) provides a search direction for the update of the Lagrangean multipliers u within each subgradient iteration. This, along with an appropriate choice of step length ensures convergence towards optimum. The way towards the 8 2. Theory optimum might be more or less zigzagging. To avoid this we will apply a conditional (projected) subgradient method [9] along with the modified deflected subgradient method presented by [1]. 2.3.1 Modified deflected subgradient method We will apply the modified deflected subgradient method (MDS) [1]. The MDS method prevents the extent of zigzagging between iterations as a pure subgradient procedure might cause, making the path towards optimum smoother. The method combines the Modified Gradient technique (MGT) [3] with the Average Direction strategy (ADS) [17]. The modified deflected search direction dt at iteration t, is a convex combination of the two search directions dt and dtMGT ADS dt = (1− ρ )dt tt MGT + ρtdADS (2.8) where ρt ∈ (0, 1). The search directions dt tMGT and dADS is constructed from the pre- vious search direction dt−1, a deflection parameter Ψt, and a subgradient gt ∈ ∂h(ut) at iterate t. The deflection parameter and the corresponding deflected subgradient direction o{f the MGT method is determined as − gtη dt−1MGT t ∥ −1∥2 if g tdt−1 < 0, dt = gt +ΨMGTdt−1Ψ dt ,t = MGT t0 otherwise, and for the the Average Direction Strategy (ADS) as ∥gtΨADS = ∥ tt ∥ t−1∥ dADS = g t +ΨADS t−1 d t d . The deflection parameter ΨMDSt for the MDS method is given by the convex combi- nation ΨMDS = (1− ρt)ΨMGT + ρtΨADS. Explicitly { −ηk(1−ρ )gtdt−1+ρ ∥gt∥∥dt−1t t ∥ t t−1 ΨMDS = ∥ if g d < 0,dt−1∥2t (2.9)0 otherwise. The current deflected subgradient direction is forced to form an acute angle with the previous step direction when choosing the deflection parameter ΨMDSt according to (2.9) and 0 < ηt ≤ 2. That is, if the angle between the previous search direction dt−1 and the current subgradient gt is obtuse, i.e. if gtdt−1 < 0, we deflect the search direction according to the above. The search direction is updated as dt = gt +ΨMDSdt−1t . (2.10) −1 (gt)T dt−1Following the suggestion in [1] we set ρt = − cos(gtdt ) = −∥ t∥∥ t−1∥ ∈ [−1, 1] andg d let η 1t = 2− . Substituting for ρt and ηt we get the expressionρt Ψ ρt(3− 2ρ t t)∥g ∥ MDS = (2 .− ρt)∥dt−1∥ We now obtain our deflected subgradient search direction through 9 2. Theory { gt if ρt ≤ 0, dt := gt + ρt(3−2ρt) ∥g t∥ dt−1 (2.11) (2− ) ∥ t−1∥ if ρρ d t > 0.t As the modified deflected subgradient method is a convex combination of the two methods MGT and AD{S, ρt ∈ (0, 1),}and thus ρt cannot take negative value. We could define ρt = max 0 − g tdt−1, ∥ t∥∥ t−1∥ , but this is redundant due to definition ing d (2.11). The search direction dt will be projected back onto a feasible direction in the case that ut is currently on the boundary of the feasible set Ω and the search direction dt is pointing out of the set. Projecting the search direction reduces its norm, which is used in the construction of the step length according to (3.14), resulting in a larger step than if the unprojected direction were used. The projected search direction is defined d̃t = dt − ν(ut) (2.12) where ν(ut) is the element in the normal cone of the dual feasible set corresponding to the projection onto a feasible direction. Proposition 1 in [1] ensures that if we choose the deflection parameter according to (2.9) the angle between the current deflected subgradient direction and the previous step direction will be acute, and hence it will eliminate a substantial portion of the zigzagging of the pure subgra- dient procedure. We have not been able to prove that the proposition above will hold for the projected deflected search direction d̃t, but expect it to hold and hence formulate Conjecture 1. Conjecture 1 Let gt ∈ ∂h(ut), d̃t be the new projected deflected subgradient direc- tion given by (2.12) and (2.9) and let {ut} be the sequence of iterations generated by the deflected subgradient scheme. If we take 0 < ηt ≤ 2 and step size τt to satisfy h∗0 − h(u t) < τt < , ∀t = 0, 1, 2, . . .∥d̃t∥2 then 1. d̃t−1(u∗ − ut) > 0 2. ∥ut+1 − u∗∥ < ∥ut − u∗∥, 3. d̃td̃t−1 ≥ 0, for all t where ut are non-optimal iterates and u∗ is an optimal solution to (2.6). After construction d̃t the iterate is updated according to ut+1 = Proj {utΩ + τ ttd̃ }, (2.13) where τt is the chosen step length. The new iterate ut+1 is projected onto the feasible set Ω in case it is not feasible. 10 2. Theory 2.4 Enhanced ergodic sequence Our choice of Lagrangean relaxation provides a lower bound on the optimal value z∗. We wish to find a way to obtain a feasible solution, and thus an upper bound on the optimal value. We introduce the enhanced ergodic sequence of solutions to the Lagrangean subproblem in (2.5), from which we can construct a heuristic. This heuristic will be presented in section 3.3.1. The use of ergodic sequences can speed up convergence towards the optimal value when introduced in subgradient iterations. The enhanced ergodic sequence ∑t−1 ∑t−1 x̃t := µtx(us), µt = 1, µts s s ≥ 0, s = 0, . . . , t− 1, (2.14) s=0 s=0 creates a sequence {x̃t} which by the use of convexity weights µts, weighs the sub- problem solutions x(ut) over iterations t. The sequence {x̃t} consists of convex combinations of previous subproblem solutions. The weights µts should be chosen according to a set of assumptions and criteria presented in [7], regarding µts and its relation to the step length τt. This in order to ensure convergence. One of these in- cludes the step length τt being a harmonic series. Meeting these assumptions would imply convergence of x̃t to the set of optimal solutions X∗ in the limit: ut → u∞ ∈ U∗ and dist(x̃t, X∗)→ 0. The following so-called sk̃−rule is presented and proved to meet the criteria in [7]. Letting k̃ ≥ 0 be an integer, the sk̃−rule defines an ergodic sequence by choosing convexity weights as k̃ µts = ∑(s+ 1)−1 , s = 0, . . . , t− 1, t = 1, 2, . . . . (2.15)t k̃ l=0(l + 1) This rule assigns higher weight to later iterations. The choice of k̃ effects how the subproblem solutions are weighted. A larger value of k̃ will yield a higher weight of later subproblems solutions. The choice of k̃ is thus relevant as it will have an impact on the rate of convergence to the optimum [7]. The ergodic iterates can be computed by only storing the previous iterate as ∑t−2 k̃ k̃ x̃t = ∑s=0(s+ 1) x̃t−1 + ∑ t xt−1−1 .t (s+ 1)k̃ t−1 k̃s=0 s=0(s+ 1) The use of the sk̃−rules ensures primal convergence when the step length used to solve the Lagrangean dual problem is a harmonic series. In Section 3.3 we will show that our chosen step length can be bounded by two harmonic series, and thereby guarantees convergence. 11 2. Theory 12 3 Development of a Lagrangean dual algorithm In this chapter we will present the Lagrangean dual problem and its sub-differential. We also present the enhanced ergodic sequence and lastly discuss the implementation of the subgradient method including suggestions for initial choices of parameters. 3.1 Lagrangean duality and relaxation In this section we {formulate the Lagrangean function. First we defin}e the followingset V := wki · ∈ Rm+1 | for fixed k and i, (1.2c)–(1.2e) hold . This set corresponds to a semi-assignment feasible set and the corresponding opti- mization problem can be solved by a greedy algorithm. We will use this set in the Lagrangean relaxation. First consider problem (1.3). We relax constraints (1.3b)– (1.3e). By doing so we relax the constraint that ensures assignment of one ground truth to exactly one estimate. Hence we allow more than one ground truth to be assigned to the same estimate. Besides this we also relax the inequality constraints on hkij. The problem therefore becomes a semi-assignment problem. Through the relaxation of the three sets of constraints we get three corresponding groups of mul- tipliers, σkj , αkij, and βkij. The equality constraint (1.3b) gives rise to the unbounded multiplier σkj ∈ R, while the relaxation of inequality constraints (1.3d) and (1.3e) initializes bounded multipliers αkij ≥ 0 and βkij ≥ 0 respectively. In doing so, we arrive at the Lagrangean function ∑ ( )K n∑+1m∑+1 γp K∑−1∑n ∑m ∑K ∑m n∑+1 L̃(w,h,σ,α,β) := dk wk + k k kij ij k=1 i=1 j=1 2 hij + σj wij − 1 k=1 i=1 j=1 k=1 j=1 i=1 K∑ (3.1)−1∑n ∑m [ ( ) ( )] + αk k k+1 k k k+1 k kij wij − wij − hij + βij wij − wij − hij . k=1 i=1 j=1 The third term comes from relaxation of constraint (1.3b), and the last term from the relaxation of (1.3d) and (1.3e). By defining α0 0ij = βij = αK = βKij ij = 1 γ p 2 2 we can rearrange the expression as 13 3. Development of a Lagrangean dual algorithm ∑K ∑n ∑m ( ) L̃(w,h,σ,α,β) :=  dkij + σkj + αkij − βk k−1 k−1 kij − αij + βij wij (3.2) ∑k=1 i=1 j=1m ∑n ∑m ( ) + dkn+1,jwk + dk kn+1,j i,m+1wi,m+1 + σk wk j n+1,j − 1 j=1 i=1 ∑ j=1K−1∑n ∑m ( p ) + dk k  γ k k kn+1,m+1wn+1,m+1 + 2 − αij − βij hij.k=1 i=1 j=1 Notice that dn+1,m+1wn+1,m+1 = 0 due to wkij ∈ V . The variable hkij is unbounded, so in order to reach optimality in the equality pαk + βk γij ij = 2 must hold ∀i, j, k. Otherwise hkij might take values ±∞. We will use this relation to project βkij onto α−space according to p βkij = γ 2 − α k ij. Rearranging the sums in (3.2) and projecting βkij onto α-space we obtain our La- grangean function with pαk γij ∈ [0, 2 ] for k = 1, . . . , K − 1 ∑K ∑n ∑m ( )−1 ∑mL(w,σ,α) := dk k k k k k k kij + σj + 2αij − 2αij wij + (dn+1,j + σj )wn+1,j k=1 i=1 j=1 j=1 ∑  (3.3) n  ∑K ∑m+ dk wk − σk (3.4) ∑i i,m+1 i,m+1 j =∑1 (∑ k=1 j=1K  n m )= (dkij + σkj + 2αkij − 2αk−1 k kij ) + di,m+1wi,m+1 (3.5) k=1∑i=1 j=1m ∑ m + (dk + σk)wk − σkn+1,j j n+1,j j . (3.6) j=1 j=1 The Lagrangean dual function can now be defined as a minimization over wk ∈ V , with subproblems being defined by sums of semi-assignment problems, as ∑K ∑n {∑m }v(σ,α) := min (dk + σk + 2αk − 2αk−1 kij j ij ij )wij + dki,m+1wki,m+1 (3.7) k=1 w k∈V ∑i=1 i·{( j=1 ) ∣ } ∑ m m + min dk k k ∣ k kn+1,j + σj wn+1,j∣wn+1,j ∈ {0, 1} − σj . (3.8) j=1 j=1 We will denote the subproblem solutions wk(αk,αk−1,σk) for k = 1, . . . , K. As we still have the binary constraints on wkij the dual function can also be formulated as 14 3. Development of a Lagrangean dual algorithm the minimization problem,∑K ∑n { { } }v(σ,α) := min min dk + σk + 2αk − 2αk−1 ; dk (3.9) =1 =(1 { j=1,...,m ij j ij ij i,m+1k ∑im } ) + min dk + σkn+1,j j ; 0 − σk j , (3.10) j=1 where we simply have to choose the j-element which minimizes the function over each row i = {1, . . . , n}. The dual problem is then formulated as the maximization problem of the Lagrangean dual function v∗ = max v(α,σ). (3.11) 0≤αk p i,j≤ γ 2 σk∈R j 3.1.1 Subgradient and sub-differential of the Lagrangean dual function Recall that in the subgradient method, we need to know the subgradient at our cur- rent position, in each iteration. The sub-differential to the Lagrangean subproblem in (α,σ)-space is given by { ∣∣ } ∂v(α,σ) = ( ∣λ,ϕ) ∣∣ λk k k k−1 k k+1 k k−1 kij = ∑wij(α ,α ,σ )− wij (α ,α ,σ ), ∀i, j, k ,ϕk n+1ij = i wk (αk,αk−1 kij ,σ )− 1 where (λ,ϕ) is a subgradient to v at (α,σ). The following relation hold for the subproblem solutions  1, ⇔ λkij = 1, wk (αk,αk−1,σk)− wk+1(αk,αk−1,σk) =  0, ⇔ λkij ij ij = 0, (3.12)−1, ⇔ λkij = −1. The step direction is projected onto the tangent cone of the feasible set Ω = {(α,σ)|αkij ∈ [0, γ p k 2 ], σj ∈ R, ∀i, j, k} at the current point α k ij, in order to en- sure feasibility. λkij is determined according to (3.12). If λkij = 1, the direction is feasible when αk γ2 k γ2 kij < 2 . However, if αij = 2 , the projection yields λij = 0, as moving left would imply leaving the feasible region. Conversely, if λkij = −1, the direction is allowed when αkij > 0, and projected to 0 if αkij = 0, to prevent moving out of the feasible set. A projected conditional subgradient λkij ∈ ∂v(α,σ) is defined for each i ∈ {1, . . . , n}, j ∈ {1, . . . ,m}, and k ∈ {1, . . . , K − 1}, as  1, if wk k k−1 k k+1 k+1 pij(α ,α ,σ ) = 1, wij (α ,αk,σk) = 0, and αkij < γ2 , λkij =  −1, if wk k k−1 kij(α ,α ,σ ) = 0, wk+1(αk+1,αk,σk) = 1, and αkij ij > 0,0, otherwise (3.13) 15 3. Development of a Lagrangean dual algorithm From σ we have subgradient ϕ ∈ ∂v(α,σ) defined as n∑+1 ϕk kj = wij − 1 ∈ {−1, . . . , n}, j = 1, . . . ,m. i=1 3.2 Implementation of modified deflected subgra- dient method We start the subgradient iterations by initializing the Lagrangean multipliers. For each iteration we solve the Lagrangean subproblems and determine the correspond- ing sub-gradient. We deflect and project the search direction according to (2.11) and (2.13) in order to obtain the new search direction. A new step length is com- puted every iteration. The choice of step length is important, as this along with the choice of step direction and relaxation influence the rate of convergence. We will implement the Polyak step length, as first presented in [14], which have shown efficient convergence result in subgradient optimization. It guarantees converge for step length parameter 0 < θ < 2. The Polyak step length = θ(v ∗ − v(αt,σt)) τt , (3.14)∥d̃t∥2 is constructed from the optimal value v∗, the current solution v(αt,σt) of the La- grangean dual problem and the search direction d̃t at (αt,σt) derived according to (2.11) and (2.12). As v∗ is unknown we will use a heuristic v̄ instead. This heuristic will be described in the next section. In order to enhance the convergence rate, we will implement an update scheme on θ, which will update θt every ℓ iterations. As a result θt may exceed the interval that guarantees convergence. The updating scheme is shown in Algorithm 1. The choice and impact of the initial value on θ0 will be discussed in Section 4.3. Algorithm 1 Update of θt every ℓ iterations 1: Input θt 2: if t mod ℓ = 0 then 3: if max q q q qq∈{t−ℓ+1,...,t} v(α ,σ )−minq∈{t−ℓ+1,...,t} v(α ,σ ) > 0.005 then 4: θt = 12θt−1 5: else if max q q q qq∈{t−ℓ+1,...,t} v(α ,σ )−minq∈{t−ℓ+1,...,t} v(α ,σ ) < 0.0005 then 6: θt = 32θt−1 7: end if 8: end if After computation of the step length we update the iterate according to (2.13). The ergodic sequence x̃0 will be initialized at an iteration Terg > t. First we wish to introduce the initial heuristic that we use as an upper bound in the computation of the Polyak step length. 16 3. Development of a Lagrangean dual algorithm 3.2.1 Initial heuristic We will use the initial heuristic developed by [2] which provides us with an upper bound on z∗ and hence also on v∗. It solves all K assignment problems iteratively for each iteration s. For each new iteration s we evaluate the previous solution (wkij)s−1 in function f , defined as  1− (wk+1 s−1ij ) − δ, if k = 1, i = 1, . . . , n, j = 1, . . . ,m 1− (wk−1 s−1 k+1 s−1(( k )s−1) =  ij ) − (wij ) − δ, if k = 2, . . . , K − 1f wij  1− (wk−1)s−1ij − δ, if k = K0, otherwise The small constant δ = 10−3 ensures that we do leave to many pairs (i, j) unassigned. We then solve the optimization problem ∑ ( )K n+1m+1 p w( ) ∑ ∑ s := argmin dkij + γ 2 f((w k s−1 ij) ) (3.15a) wk,∀k k n∑=1 i=1 j=1+1 s.t. wkij = 1, j = 1, . . . ,m, k = 1, . . . , K, (3.15b) i m∑=1+1 wkij = 1, i = 1, . . . , n, k = 1, . . . , K, (3.15c) j=1 wkn+1,m+1 = 0, k = 1, . . . , K, (3.15d) wkij ∈ {0, 1}, i = 1, . . . , n+1, j = 1, . . . ,m+1, (3.15e) k = 1, . . . , K. The new cost function will ensure that if (wk −1)s−1 = 1 and (wk +1)s−1ij ij = 1 in the previous iteration, it is cheaper to assign (wkij)s = 1 in the current iteration, than to 0. The algorithm is terminated when two consecutive solutions (wk)s−1 = (wk)s are equal. If this does not occur, we terminate after K iterations. 3.3 Implementation of enhanced ergodic sequence The Lagrangean relaxation provides an infeasible solution. We therefore wish to find a way to obtain a feasible solution and an upper bound on the optimal value. We will therefore introduce the enhanced ergodic sequence, from which we are able to obtain a such with the use of a heuristic. The sequence is constructed as described in Section 2.4. As mentioned above, we update our subgradient iterations using the Polyak step length rule. It can be shown that the Polyak step length can be bounded by two harmonic series, according to a ≤ θt(v ∗ − v(αt,σt)) ≤ a t+ 1 ∥d̃t∥2 t for all t ∈ Z+, where 0 < a ≤ A < ∞, and is thus a divergent series [10]. The ergodic sequence converge towards a feasible solution under subgradient iterations with our choice of step length. 17 3. Development of a Lagrangean dual algorithm 3.3.1 Heuristic solution from the enhanced ergodic sequence The ergodic sequence solutions {x̃t} are not guaranteed to be feasible solutions. Nor will z(x̃) provide any bound on z∗. We therefore introduce a heuristic that will pro- vide a feasible solution x̄ from an element in the ergodic sequence {x̃t}. The feasible solution will fulfill all constraints in (1.3). The heuristic uses the ergodic sequence weights as a cost function and solves the assignment problem as a maximization problem over all time steps k individually. From the Scipy package in Python we use the function linear_sum_assignment. The function is based on a modified Jonker- Volgenant-Castanon (JVC) algorithm, which solves the linear assignment problem by efficiently finding augmenting paths using a shortest-path approach. For further details on the algorithm used by the function see [5]. The heuristic will provide us with a feasible solution x̄. The solution will then be evaluated in the objective of the primal problem as z(x̄). Any feasible solution x̄ will provide an upper bound on the optimal value, z∗ ≤ z(x̄). The optimality gap between the best lower and upper bound, v(α,σ), and z(x̄) respectively, is implemented in the termination criterion, z(x̄)− v(α,σ) ( ) < ϵ̃, (3.16)v α,σ where ϵ̃ is some small number representing the maximum allowed percentage gap if multiplied with 100. 3.3.2 Algorithm scheme The final algorithm scheme can be seen in Algorithm 2. 18 3. Development of a Lagrangean dual algorithm Algorithm 2 Subgradient optimization method with ergodic sequence 1: Initialize α0, σ0, θ0, Terg, ℓ, m 2: Compute initial heuristic according to (3.15) 3: while termination criteria (3.16) not met do 4: Solve Lagrangean subproblems ⇒ v(αt,σt) 5: Calculate sub-gradient (λt, ϕt) 6: Determine ΨMDS according to (2.3.1) 7: Update search direction d̃t according to (2.12) 8: Update step length τt according to (3.14) { } 9: Update dual variables (αt+1,σt+1) = Proj (αt,σt) + Ψ d̃tΩ MDS 10: if t mod ℓ = 0 then 11: Update θ according to Algorithm 1 12: end if 13: if t > Terg then 14: Compute convexity weights µts by sk-rule according to (2.15) 15: Update x̃t 16: if t mod m = 0 then 17: Compute heuristic solution x̄ as described in section 3.3.1 18: end if 19: end if 20: Check termination criteria 21: end while 19 3. Development of a Lagrangean dual algorithm 20 4 Implementation and Results In the following chapter we discuss the implementation and the obtained results. We present our findings through graphs and tables. Initially we introduce a set of parameters used for the implementation. The method includes a series of parameters with an interdependent influence on the performance of the method. A discussion on how to choose values for these parameters is conducted Sections 4.2.1 and 4.3. Further we compare the performance of our method with four other algorithms. All times displayed are significantly slower than the computational times presented in the compared reports. This is a result of the computational power available. As everything is run on the same computer, it is possible to use and compare the performance of the different methods, despite the relatively long computational time. 4.1 Initial choice of parameters for the 18 data sets The 18 data sets are all presented on the form n_m_K, i.e. data set 14_18_100 contains 14 ground truths, 18 estimates and 100 time steps. All ground truths and estimates are not necessarily existing for all time step K. This varies over data set. Initial choice of dual variables α0 and σ0 is based on performance during testing. We suggest that the initial value of multiplier α0 should be chosen as γp4 . The best choice of initial value on σ0 varies over data sets. The initial value σkj = 0,∀j, k is implemented for all data sets. For all data sets we relax the assignment constraints corresponding to the larger dimension of ground truths and estimates. This is not best choice for all cases. The choice to always relax the larger dimension is done for consistency reasons. We divide the 18 data sets into two different groups when choosing the initial step length parameter θ0. This division is solely based on testing the performance of different initial values of θ0. We found that data sets with K = 100, except data set 34_38_100, and the one data set with K = 150 performed relatively well with initial value θ0 = 5. We will refer to this group of data sets as the smaller group. For data sets with K = 200 and data set 34_38_100, the initial value is chosen as θ0 = 10 as testing found that this was seem as the best choice when grouping the remaining data set, despite that this choice is not ideal for any of the data sets. These data sets are referred to as the group of larger data sets. We continuously update θt every 300 iterations according to Algorithm 1. The ergodic sequence is 21 4. Implementation and Results initialized after 1000 iterations with k = 4. We will use the optimality gap (3.16) as termination criterion. We set ϵ̃ = 0.02, which corresponds to a optimality gap of 2% of the Lagrangean dual. We will allow a maximum of 5000 iterations. If the allowed optimality gap has not been reached after 5000 iterations, we terminate. In general the above described parameter package is not tuned for any of the data sets. The method has shown sensitivity to the changes in parameters, and the perfor- mance is therefore affected by the combination of relaxation, choice of parameters and the initialization of the ergodic sequence. The different parameters, and the initialization of the ergodic sequence is discussed in Sections 4.2.1 and 4.3. 4.2 Performance In order to compare the performances between our algorithm, ADMM and ALM, we set the localization error to c = 20, p = 1, and γ = 2 as suggested in [2], when the distance matrix is computed for all data sets. We start by presenting the per- formance of our algorithm. Table 4.1 display the results for the eight data sets where we have chosen initial parameter value θ0 = 5. Table 4.2 shows the remaining data set for θ0 = 10. Table 4.1 shows that our algorithm terminates within reasonable time and shortly after we have initiated the ergodic sequence, for all data sets except for data set 15_16_100. This indicates that by the time we start weighing subproblem solutions, the dual function have converged to near optimum, which makes it easy for the heuristic to find a good solution fast. See Figure 4.1a of data set 14_18_100 for visualization. Considering the group of data sets with θ0 = 10 in Table 4.2, we see that the method demands a higher number of iterations before it terminates. We also see that the computational time increases significantly. For three data sets the method does not terminate before reaching the maximum allowed number of iterations, in particularly data set 27_28_200 is interesting, as we see a large optimality gap, indicating that the method has not converged. 22 4. Implementation and Results Ergodic sequence method Data set z∗ z(x̄) v(α,σ) Time (sec) Iterations Optimality gap 7_8_100 377.416 384.416 377.098 1.60 1300 0.0194 11_10_100 716.306 728.306 715.69 1.57 1100 0.0176 11_12_100 607.404 618.404 606.643 1.68 1100 0.0194 14_12_100 885.590 892.590 884.833 1.78 1100 0.0088 14_13_100 664.185 669.185 660.168 2.15 1200 0.0137 14_14_100 858.799 872.799 857.136 2.50 1300 0.0183 14_18_100 1170.796 1182.771 1167.776 2.87 1400 0.0128 15_16_150 988.633 993.633 974.646 9.79 2600 0.0194 Table 4.1: Obtained results for the ergodic sequence method evaluated in the objective function (1.2a) including computational time, number of iterations and optimality gap as defined in (3.16) with ϵ̃ = 0.02, the ergodic sequence initialized after 1000 iterations and θ0 = 5, along with the optimal solution z∗. Ergodic sequence method Data set z∗ z(x̄) v(α,σ) Time (sec) Iterations Optimality gap 15_16_200 1083.459 1096.459 1074.968 23.85 4445 0.0200 15_17_200 1249.315 1269.316 1248.355 26.45 4100 0.0168 18_17_200 1082.224 1095.224 1063.874 40.21 5000 0.0295 20_18_200 1266.169 1280.169 1257.378 32.43 3100 0.0181 22_21_200 1637.545 1649.545 1617.226 63.32 4847 0.0200 23_24_200 1708.786 1715.786 1675.930 65.36 5000 0.0238 25_22_200 1445.225 1457.225 1429.553 76.67 4900 0.0194 27_28_200 2055.313 2226.099 1962.237 113.34 5000 0.1345 30_26_200 1723.132 1737.132 1686.066 136.79 5000 0.0303 34_38_100 2187.668 2201.668 2160.962 28.06 1800 0.0188 Table 4.2: Obtained results for the ergodic sequence method evaluated in the objective function (1.2a) including computational time, number of iterations and optimality gap as defined in (3.16) with ϵ̃ = 0.02, the ergodic sequence initialized after 1000 iterations and θ0 = 10, along with the optimal solution z∗. For visualization of Table 4.1, Figure 4.1 shows the convergence of v(α,σ) and z(x̄) along with z∗. We see that the heuristic fluctuates a bit, and despite reaching a lowest upper bound it increases again. This might be due to small changes in the Lagrangean objective which causes the convexity weights to change, and in turn affects x̄. Figure 4.1 shows the step lengths over iterations. We see that after 800 iterations the step length hits a plateau and only decreases slowly towards 0. Termination occurs as iteration 1400; see Table 4.1. 23 4. Implementation and Results (a) Objective values of the La- (b) The resulting sequence of step grangean dual sequence, the ergodic lengths. sequence of primal subproblem so- lutions, feasible solutions from the heuristic, and the optimal value. Figure 4.1: Resulting sequences of objective function and step length measures for the data set 14_18_100 and initial step length parameter θ0 = 5. Evaluating the optimality gap is interesting as it reveals how the different parts of the algorithm performs. We are interested in the gap between the heuristic solution z(x̄) and z∗. We compute the relative optimality gap according to z(x̄)− z∗ ∗ × 100%. (4.1)z Table 4.3 presents the percentage by which the best upper bound z(x̄) overestimates the optimal value z∗. Table 4.3 is divided into the two separate data set groups, where the data sets above the middle line is the group with θ0 = 5, and below the line is the group of larger data sets with θ0 = 10. We see that except for data set 27_28_200 all upper bounds deviates less than 2% from the optimal value z∗. Interestingly, the first three data sets returns the greatest relative optimality gaps, despite early termination. Data sets 18_17_200, 23_24_200, and 30_26_200, which failed to terminate under the termination criteria, returns a relative gap of 1.201%, 0.410%, and 0.812% respectively. So in spite of late or lack of termination in the group of larger data sets, the best upper bound for of the data sets, except 27_28_200 are acceptable, considering that it returns a feasible solutions. For data sets 22_21_200, 23_24_200, 25_22_200, 30_26_200, and 34_38_200 the relative optimality gap is less than 1%, which might be an acceptable gap for feasibility. If one was to accept a larger gap between the feasible solution and optimum, this might also decrease the computational time needed. 24 4. Implementation and Results Data set Relative optimality gap 7_8_100 1.855 % 11_10_100 1.675 % 11_12_100 1.811 % 14_12_100 0.790 % 14_13_100 0.753 % 14_14_100 1.630 % 14_18_100 1.023 % 15_16_150 0.506 % 15_16_200 1.200 % 15_17_200 1.601 % 18_17_200 1.201 % 20_18_200 1.106 % 22_21_200 0.733 % 23_24_200 0.410 % 25_22_200 0.830 % 27_28_200 8.309 % 30_26_200 0.812 % 34_38_100 0.640 % Table 4.3: Relative optimality gap calculated according to (4.1). The gap indicates the percentage by which z(x̄) overestimates z∗. The data sets above the centerline represents the data sets with initial step length parameter θ0 = 5 and the ones below the data sets with initial value θ0 = 10. 4.2.1 Choice of parameter values First we discuss the choice of starting iteration Terg for the ergodic sequence. Fig- ure 4.2 shows the objective value of the ergodic sequences and their corresponding heuristic solutions for different starting iterations for data set 14_18_100. In Fig- ure 4.2a the ergodic sequence converges faster for later initialization iterations, but the corresponding heuristic solutions in Figure 4.2b does not. The heuristic is not particularly sensitive to the choice of Terg, as starting after 500 and 1000 iterations provides roughly the same objective values. Starting the ergodic sequence after 100 iterations implies that we weigh subproblem solutions before the dual objective has started to converge, and as a result there of, the blue curve starts provides bad initial objective values. Starting after 2000 iterations, the red curve finds its mini- mum almost directly. So despite the ergodic sequence converges fast, the heuristic does not. This might be due to the choice of heuristic. The plot does not give any motivation to initiate the ergodic sequence before 1500 iterations. 25 4. Implementation and Results (a) Objective values of ergodic (b) Objective values of heuristic sequences with different initialization solutions corresponding to the iterations. ergodic sequences in Figure 4.2a. Figure 4.2: Objective values of the ergodic sequences and corresponding heuristic solutions for data set 14_18_100 with varying initialization iteration Terg of the ergodic sequence. The same pattern is seen in Figure 4.3 for data set 25_22_200. Here we see that initializing the ergodic sequence after 1000 subgradient iterations, will lead to a very fast convergence. There is no advantage in weighing the early subproblems: starting after 200 or 500 iterations does not lead to faster convergence. (a) Objective values of ergodic (b) Objective values of heuristic sequences with different starting solutions corresponding to the iteration. ergodic sequences in Figure 4.3a. Figure 4.3: Different initialization iteration for ergodic sequences with their cor- responding heuristic solutions for data set 25_22_200. In both cases, the Lagrangean objective value has almost reached optimum before any of the ergodic sequences are initialized. We also see that we do not gain any- thing in convergence speed by initializing the ergodic sequences too early. It might even be beneficial to wait until the Lagrangean stabilized, as we see in Figure 4.3b; the heuristic finds its best upper bound rather quickly when starting the ergodic sequence after 2000 iterations. The heuristic is very stable to changes in the ini- tialization of the ergodic sequence, as we see in both cases that initialization after 26 4. Implementation and Results 200, 500, and 1000 iterations all find the best value almost simultaneously. The computation of the ergodic sequence is not very demanding, but should be taken into account when choosing the starting iteration Terg. 4.3 Relaxation strategy and choice of step length parameter θ The rate of convergence is highly dependent on the initial choice of parameter value θ0 and relaxation strategy. As the relaxation of the assignment constraint practically implies that we are allowed to assign more than one ground truth to an estimate in the same row or column, or vice versa, we had the assumption that a relaxation of the assignment constraint of the larger dimension of the ground truth or estimate would lead to faster convergence. Consider the graphs in Figure 4.4. The figure displays three data sets of which each two sub-figures illustrating the convergence for a relaxation over the columns on the left, and rows on the right for different initial values on θ0. 27 4. Implementation and Results (a) Data set: 14_18_100. (b) Data set: 14_18_100. Relaxation of the columns. Relaxation of the rows. (c) Data set: 14_12_100. (d) Data set: 14_12_100. Relaxation of the columns. Relaxation of the rows. (e) Data set: 34_38_100. (f) Data set: 34_38_100. Relaxation of the columns. Relaxation of the rows. Figure 4.4: Lower bounds’ development over the iterations, for relaxation of assign- ment constraints on ground truths (columns) and on estimates (rows), respectively, for data sets 14_18_100, 14_12_100, and 34_38_100 and varying initial values on θ0 displayed in the legends. θt is updated according to Algorithm 1. Considering data set 14_18_100 in Figure 4.4a and Figure 4.4b, we see that a relax- ation over the columns leads to faster convergence for all displayed values of initial value on θ0. It might not be easy to see, but Figure 4.4a reveals that θ0 = 5 is the best choice of the tested values. The same conclusion on the relaxation can be drawn for the other two data sets. For data set 34_38_100 in Figure 4.4e and 28 4. Implementation and Results Figure 4.4f, the columns lead to faster convergence along with the choice of a large initial value on θ0, despite Figure 4.4e shows fast convergence for all of the tested values on θ0. For data set 14_12_100 in Figure 4.4c and Figure 4.4d a relaxation of the rows leads to faster convergence, opposed to the columns. Figure 4.4c clearly shows that the wrong choice of relaxation will lead to very slow convergence. In con- clusion, it is clear for all three cases that a relaxation of the larger dimension leads to faster convergence for an appropriate choice of θ0. From all plots, independent of the relaxation, the larger value of θ0 will cause the Lagrangean objective value to decrease before increasing. In some cases with near equal dimensions on rows and columns, some values of θ0 for the relaxation of the smaller dimension is the better choice. The initial choice of σkj also has an impact on the convergence. In Figure 4.5 the impact of the initial starting values for σkj , ∀j, k on the convergence for the Lagrangean dual function is plotted. Among the tested values, σkj = −5 yields the fastest convergence. Further σkj = 0 is the only value on σkj for which the initial objective value of the dual function is non-negative. It is clear that the initial choice of multipliers σ has a slight impact on the convergence. Figure 4.5: Convergence for different initial values of σ with the initial value of θ0 = 5 for data set 14_18_100. Recognizing that our method is sensitive to parameter choices, we proceed with a performance comparison against alternative algorithms. 4.3.1 Comparison with an alternative Lagrangean Dual Method and Linear programming relaxation The performance of solving the problem for the same data sets with an alternative Lagrangean dual method (ALD) and an LP solver for solving a linear programming relaxation of the problem is presented in Table 4.4. Notice in Table 4.4 that z∗ ∗LP = z for all data sets; see values on z∗ in Table 4.1 and Table 4.2. The ALD uses the initial heuristic presented in section 3.2.1 as an upper bound and as heuristic in the Polyak step length rule. The optimality gap displayed in Table 4.4 is constructed according to 29 4. Implementation and Results z∗ − zALD ∗ × 100%, (4.2)z where zALD is the objective value, hence best lower bound on z∗, found by the ALD algorithm. This displays the percentage by which the ALD method underestimates z∗. Compared to our method, ALD manages to find an solution in fewer iterations. For all data sets where ALD manages to find an exact solution, the method ter- minates after one iteration. We see that the optimality gap is smaller for all data sets compared to optimality gap displayed in Table 4.1 and Table 4.2. Compared to computational times of the ergodic method displayed in Table 4.1 and Table 4.2, ALD is faster for almost all instances. For the data sets 14_18_100 and 34_38_100 our method terminates faster. We have not been able to verify that our method per- forms better when there is a greater difference between the number of ground truths and estimates, but this result might be a small indication that it does. Algorithm Data sets ALD LP-solver Obj. value Time (sec) Iterations Optimality gap z∗LP Time (sec) 7_8_100 377.416 0.74 1 0 377.416 1.93 11_10_100 716.306 0.98 1 0 716.306 4.18 11_12_100 607.404 0.86 1 0 607.404 4.89 14_12_100 885.590 0.89 1 0 885.590 7.20 14_13_100 664.185 0.88 1 0 664.185 7.49 14_14_100 858.307 2.36 144 0.057 858.799 8.11 14_18_100 1170.299 15.25 821 0.042 1170.796 10.64 15_16_150 988.134 5.29 115 0.050 988.633 16.50 15_16_200 1082.834 15.49 306 0.058 1083.459 22.63 15_17_200 1249.315 4.44 1 0 1249.315 28.55 18_17_200 1082.224 4.10 1 0 1082.224 31.33 20_18_200 1266.169 5.28 1 0 1266.169 68.90 22_21_200 1635.049 41.29 894 0.152 1637.545 55.33 23_24_200 1706.786 13.99 202 0.117 1708.786 68.93 25_22_200 1443.225 14.65 202 0.138 1445.225 80.045 27_28_200 2051.186 53.28 781 0.201 2055.313 129.28 30_26_200 1721.777 18.81 209 0.079 1723.132 156.51 34_38_100 2187.287 38.48 1029 0.017 2187.668 101.76 Table 4.4: Objective value with corresponding average computational time for all 18 data sets with parameters c = 20 and γ = 2, with the optimality gap according to (4.2). 4.3.2 Comparison with regularization algorithms We compare our results with three different regularization methods; REG1 and REG2, presented in [13], and ADMM in [2]. All three methods use a of regularization function added to the objective, which will help smoothen the objective function and make convergence faster. We compare the three methods in terms of computational time, result and error. 30 4. Implementation and Results 4.3.2.1 ADMM The results for the regularization method ADMM with initialization is presented in Table 4.5. ADMM provides a feasible solution, but the objective value evaluated in the ADMM objective function might differ slightly from the one the same solution would provide in the original primal problem. This is due to the regularization term. In order to compare the two methods, we therefore evaluate the solution wADMM found by ADMM in the primal objective function (1.2a). The initial heuristic de- scribed section 3.2.1 is used as initialization of the ADMM. The heuristic manages to find an optimal or near optimal solution for most data set, which give reason for the few iterations needed for the ADMM method to terminate. As it appears in Table 4.5, ADMM manages to find optimum for all data sets, except for 34_38_100. Comparing the result to our suggested method, ADMM performs significantly better on all aspects. ADMM method Data set z(wADMM) Time(sec) Iterations Optimality gap 7_8_100 377.416 0.74 1 0 11_10_100 716.306 0.86 1 0 11_12_100 607.404 0.88 1 0 14_12_100 885.590 0.93 1 0 14_13_100 664.185 1.05 1 0 14_14_100 858.800 1.40 10 0 14_18_100 1170.796 1.43 10 0 15_16_150 988.633 3.59 1 0 15_16_200 1083.459 5.01 6 0 15_17_200 1249.316 4.52 1 0 18_17_200 1082.224 3.91 1 0 20_18_200 1266.169 4.11 1 0 22_21_200 1637.545 5.90 10 0 23_24_200 1708.786 5.60 6 0 25_22_200 1445.225 6.27 6 0 27_28_200 2055.313 7.58 10 0 30_26_200 1723.132 6.89 6 0 34_38_100 2191.668 2.73 10 0.184 Table 4.5: Obtained solutions for methods ADMM and ergodic sequence method evaluated in objective function (1.2a) including computational time and with the optimality gap according to (4.1). 4.3.2.2 REG1 and REG2 We compare our method with the regularization methods REG1 and REG2 on gen- erated data. We generate data from parameter values as described in [13]. We only compare results obtained running on the CPU. For our algorithm we set θ = 2 with continuous update, ϵ̃ = 0.001, and with the same relaxation strategy as described in Section 4.1. 31 4. Implementation and Results Initially, we construct to two small structured data sets. Table 4.6, our algorithm manages to find optimum much faster. The gap displayed in Table 4.6 is the actual gap between z∗LP and the best objective value found for each method. The regu- larization methods does not return an exact solution, but an approximation. Our method returns an exact feasible solution. This is to be taken into consideration when choosing method. Structured data Method Data set: 15_16_20 Data set: 15_16_40 Obj. value Time (sec) Gap Obj. value Time (sec) Gap z∗LP 24.8743 2.55 - 33.7232 4.69 - Erg. seq. 24.8743 1.59 - 33.7232 1.70 - REG. 1 25.6812 26.89 0.8069 35.5656 49.04 1.8424 REG. 2 24.8779 19.20 0.0036 33.7262 34.12 0.0030 Table 4.6: Obtained objective value, elapsed computing time and difference be- tween z∗LP and the obtained objective value for different solution algorithms, for structured data generated with c = 0.25 and γ = 1. For the small dataset, both our method and the LP solver demonstrate faster compu- tational performance. We now extend the comparison to larger instances, comparing REG2, the LP solver, and our algorithm. Over 50 instances we generate structured data for varying numbers of time steps K ∈ {5, 10, 15, 20, 25, 30, 35, 40, 45, 50}, with dimension n = m = 30. See [13] for a thorough description of parameters and details on the data generation. Figure 4.6 displays the time used by the three methods for two different values on ϵ̃. For both cases, our algorithm and the LP-solver is faster for K = 5. In Figure 4.6a we see that for all other values of K the regularization method is superior. Figure 4.6b our method is competitive to REG2 in terms of computation time up until K = 20, but slower for the rest of the instances. (a) Termination criteria for ergodic (b) Termination criteria for ergodic sequence ϵ̃ = 0.001 sequence ϵ̃ = 0.02 Figure 4.6: Computational time for the LP-solver, ergodic sequence method and REG2 over instances with varying number of time steps K. 32 4. Implementation and Results Considering the corresponding relative error, our algorithms performs well for small values of K. In Figure 4.7b we see that our algorithm manages to returns a feasible and optimal solution for each of the 50 instances and forK ∈ {5, 10, 15, 20, 30, 40, 50}. For the values of K, where our algorithm does not terminate, there is a risk of large relative error compared to REG2. For K = 45 the greatest relative error is above 20% with a mean of roughly 2.5%. So despite the slow convergence and long com- putation, we have a method which, for particular types of data, manages to return optimal feasible solutions. Next, consider the relative error for our method run with termination criteria ϵ̃ = 0.02 shown in Figure 4.7c. For time step K = 5 and K = 10, the mean relative error is almost 0, indicating that the algorithm man- ages to find an optimal or near optimal solution for the majority of instances. For K ∈ {15, 20, 25, 30, 35} the relative error is roughly of the same size as the relative error for REG2, see Figure 4.7a. For K = 50 we obtain is slightly lower relative error than REG2, but for K = 40 and K = 45 the mean is notably larger, which might be due to the large outliers. (a) Relative error for REG2. (b) Relative error for Ergodic sequence algorithm with ϵ̃ = 0.001. (c) Relative error for Ergodic sequence algorithm with ϵ̃=0.02. Figure 4.7: The relative error showing the maximum and the mean relative error. The standard deviation is indicated by the shaded area. Next we consider the case where n ≠ m with the LP-solver and our algorithm. We run for 50 instances over varying K with n = 15 and m = 23. The result is shown in Figure 4.8. For this instance we are outperformed by the LP-solver, indicating 33 4. Implementation and Results that the performance of our method varies depending on the dimensions of the data set. Figure 4.8: Computational time for the LP-solver and our algorithm over varying numbers of time steps K for data set 16_24_K. 34 5 Discussion and Conclusion 5.1 Performance Our method has shown high sensitivity to changes and combinations in parameters such as the initial values of the multipliers σkj , the step length parameter initial value θ0, the value k̃ of the exponent in the sk-rule in Section 2.4 and the relaxation strategy. Our method is less sensitive to the choice of Terg, as the heuristic provides roughly the same solutions independently of the start iteration for computing the enhanced ergodic sequence. Comparing with ALD, our algorithm is outperformed regarding running time, but as it always provides a feasible solution, it is an alternative choice to consider for some data sets, as it is competitive for small data dimensions. Interestingly for the two data sets 14_18_100 and 34_38_100, our method terminates faster under the allowed optimality gap. This may indicate that our method is capable of handling large discrepancy between the number of ground truths and estimates better. Un- fortunately we have not be able to verify this assumption. The regularization method ADMM provides both a feasible solution and fast com- putations, compared to our method. For the one instance in Table 4.5 where ADMM does not find an optimum, our method still returns a poorer solution. REG1 and REG2 returns an approximation of the objective value. We have not been able to extract the solution matrix from the code; thus we have not been able to evaluate the exact objective value in (1.2a). Our method showed that it was competitive for the data sets 15_16_20 and 15_16_40 in Table 4.6. When testing larger values of K, the algorithm again suffers from long computational time and slow convergence. We found that the computational time is linearly increasing with the number of K. 5.2 Heuristic The heuristic we use to obtain a feasible solution performs well, but it would be interesting to develop and investigate a heuristic, which would be more stable to changes in the Lagrangean dual function. In many cases the heuristic manages to find a reasonable, although not optimal, solution. For the smaller data sets the Lagrangean dual function has almost converged before the heuristic is computed. That is, the heuristic manages to find a good solution almost directly, and hence a smaller value on ϵ̃ would most likely lead to a better solution, without being too 35 5. Discussion and Conclusion computationally expensive. For the larger data sets, we can not allow a higher value on ϵ̃, as the subgradient method has a hard time converging. The time constructing the feasible solution is also worth considering. A feasible solution is time consuming to construct and therefore we only compute a feasible solution each 100 iterations. A new heuristic that can be computed faster is of high interest. The heuristic returns an assignment for time step k independently of neighboring solutions for k − 1 and k + 1. An extension of the heuristic to, considering the neighboring assignment problems would be attractive. As an alternative, it would be interesting to investigate an extension of Vogel’s approximation method [16]. 5.2.1 Vogel’s approximation method Starting the project we had a hope that using Vogel’s approximation method after a number of subgradient iterations would be able to provide us with a good and potentially optimal solution. Vogel’s approximation method managed to solve each separate assignment problem, for time step k, to optimality, but failed to connected the individual time steps, solutions to one another, causing the track switching penalty error to be very large. Despite the negative results, the combination of Lagrangean relaxation and the approximation method showed great computational potential. Further development of the approximation method would therefore be necessary, but it would be interesting to investigate the potential of if this combi- nation. For description of Vogel’s approximation method, see Appendix A.1 5.3 Future work The proposed method cannot compete with the regularization method ADMM, but a combination of a regularization method and the ergodic sequences would be in- teresting to investigate. For this method to meet its full potential, a more thorough investigation of the parameters is needed, along with potentially looking over how to optimize the computation through, for instance, parallelization. We also encourage an attempt to prove Conjecture 1 used in the projected modified deflected subgradient scheme. 5.4 In conclusion We have successfully developed and implemented a new Lagrangean dual method for solving the TGOSPA metric. The potential for further developments and improve- ments on the algorithm is great. Specifically the initial heuristic and the heuristic used to obtain a feasible solution has room for improvement. Unlike the other meth- ods, the Lagrangean dual algorithm provides a feasible and exact solution, and does not return an approximation. As conclusion on the performance, we have found that the Lagrangean dual algorithm is an attractive choice for some specific data sets. In these cases it is computationally fast and provides a feasible solution to the primal. The method suffers from sensitivity to parameters and slow iterations, and 36 5. Discussion and Conclusion was outperformed by the other methods tested on most aspects. Using our method would require a thorough investigation and tuning of parameters, and one has to consider how important a feasible solution is at the expense of speed. 37 5. Discussion and Conclusion 38 Bibliography [1] Belgacem, R. Amir, A. (2018). A new modified deflected subgradient method. Journal of King Saud University - Science vol 30, issue 4, pp. 561–567 (2017) https://doi.org/10.1016/j.jksus.2017.08.004 [2] Carlsson, C. and Ekelund, H.: Sequentially connected 2D assignment problems solved using Lagrangian dual methods. Master’s thesis, Chalmers University of Technology (2022) https://hdl.handle.net/20.500.12380/305151 [3] Camerini, P.M., Fratta, L., Maffioli, F. : On improving relaxation methods by modified gradient techniques. In: Balinski, M.L., Wolfe, P. (eds) Nondiffer- entiable Optimization. Mathematical Programming Studies, vol 3. pp. 26–34, Springer, Berlin, Heidelberg (1975) https://doi.org/10.1007/BFb0120697 [4] Conforti, .M., Cornuéjols, G., Zambelli, G. : Integer programming. Graduate texts in Mathematics Springer, Berlin, Heidelberg (2014) [5] Crouse D. F. : On implementing 2D rectangular assignment algorithms. IEEE Transactions on Aerospace and Electronic Systems, vol. 52, no. 4, pp. 1679– 1696, (2016) https://doi: 10.1109/TAES.2016.140952 [6] Geoffrion, A.M : Lagrangean relaxation for integer programming. In: Balin- ski, M.L. (eds) Approaches to Integer Programming. Mathematical Pro- gramming Studies, vol 2. pp. 82–114, Springer, Berlin, Heidelberg (1974) https://doi.org/10.1007/BFb0120690 [7] Gustavsson, E., Patriksson, M., Strömberg, A. : Primal convergence from dual subgradient methods for convex optimization. Math. Program. 150, pp. 365–390 (2015) https://doi.org/10.1007/s10107-014-0772-2 [8] Larsson T., Migdalas S., Strömberg, A. : A Preprocessing Scheme for Vogel’s Approximation Method, Linköping Institute of Technology (1988) [9] Larsson T., Patriksson M., Strömberg A. : Conditional subgradient optimization - Theory and applications. European Journal of Operational Research, vol. 88, pp. 382–403 (1996) https://doi.org/10.1016/0377-2217(94)00200-2 [10] Larsson T., Patriksson M., Strömberg A. : Ergodic, primal convergence in dual subgradient schemes for convex programming. Math. Program. 86, pp. 283–312 (1999) https://doi.org/10.1007/s101070050090 [11] Lundgren J., Rönnqvist M., Värbrand P.: Optimization. Studentlitteratur AB, Lund (2010) [12] Luo W., Xing J., Milan A., Zhang X., Liu W., Kim T.: Multiple ob- ject tracking: A literature review. Artificial Intelligence, vol. 293 (2020) https://doi.org/10.1016/j.artint.2020.103448 [13] Nevelius Wernhom, V. and Wärnsäter A.: Efficient Evalu- ation of Target Tracking Using Entropic Optimal Transport. 39 Bibliography Master’s thesis, Chalmers University of Technology (2024) https://odr.chalmers.se/server/api/core/bitstreams/3885a6e0-f649-4715- b3bf-80abc0be53fa/content [14] Polyak, B. T. : Gradient methods for the minimization of functionals. USSR Computational Mathematics and Mathematical Physics, vol. 3, issue 4, pp. 864–878 (1963) https://doi.org/10.1016/0041-5553(63)90382-3 [15] Rahmathullah, A. S., García-Fernández, Á. F., and Svensson, L.: Generalized optimal sub-pattern assignment metric. Proceedings of the 20th International Conference on Information Fusion (2017) https://doi.org/10.23919/ICIF.2017.8009645 [16] Rao S.S.: Optimization theory and applications. Wiley Easter Limited. Second edition (1984) [17] Sherali H.D., Ulular O. : A primal-dual conjugate subgradient algorithm for specially structured linear and convex programming problems. Appl Math Optim 20(1), pp. 193-–221 (1989) https://doi.org/10.1007/BF01447654 [18] Strömberg, A.: Lagrangean duals and subgradient methods for linear and mixed-binary linear optimization. Chalmers University of Technology (version 0.2, October 29, 2022) [19] García-Fernández, Á. F., Rahmathullah, A. S., and Svensson, L.: A metric on the space of finite sets of trajectories for evaluation of multi-target track- ing algorithms. IEEE Transactions on Signal Processing, 68:3917–3928 (2020) https://ieeexplore.ieee.org/document/9127194 40 A Appendix A.1 Vogel’s approximation method Vogel Approximation method, VAM, is an approximation method that usually yields a starting basic feasible solution very close to the optimal solution [16]. VAM iter- atively allocates supply and demands through the following steps: • Step 1: Identify the two lowest costs in each row and column of the given cost matrix and then write the absolute row and column difference. • Step 2: Identify the row and column with the maximum penalty and assign the corresponding cell’s min (supply, demand). If two or more columns or rows have the same max penalty, then we just choose either one. • Step 3: If the assignment in the previous satisfies the supply or demand respectively at the origin, delete the corresponding row or column. • Step 4: Stop the procedure if supply at each origin is 0. If not repeat steps above. I A. Appendix A.2 Python code existing algorithms We have used the following functions from [2]: • def Get_upper_bound(...) • def Update_W(...) These functions and the code for ADMM and ALD can be downloaded from the following link: https://github.com/Camilla-C/sequentially-connected-2D-assignment-problems We have used the following functions from [13]: • def linear_programming_solve(...) • def generate_structured_data(...) These functions and the code for REG1 and REG2 can be downloaded from the following link: https://github.com/alfredwarnsater/efficient-mtt-eval-ot-MVEX03/tree/main II DEPARTMENT OF MATHEMATICAL SCIENCES UNIVERSITY OF GOTHENBURG Gothenburg, Sweden www.gu.se