← All Posts
CP · Optimization· Part 9 of 14

Hungarian Algorithm Walkthrough

The Hungarian algorithm (also known as the Kuhn–Munkres algorithm) is the gold-standard method for solving the assignment problem in O(n³) time. Given n workers and n jobs with a cost matrix, it finds the minimum-cost one-to-one assignment. This post walks through every detail: the theory behind potential functions and reduced costs, a complete hand-traced 4×4 example, a contest-ready C++ implementation, and the analysis that proves both correctness and the cubic time bound.

The Assignment Problem

The assignment problem is one of the most fundamental problems in combinatorial optimization. It asks: given n workers and n jobs, with a cost c[i][j] for assigning worker i to job j, find a one-to-one assignment of workers to jobs that minimises the total cost.

Formal Definition

We are given an n × n cost matrix C where entry c[i][j] is the cost of assigning worker i to job j. We want a permutation π of {0, 1, …, n−1} that minimises:

minimise  Σ c[i][π(i)]   for i = 0 … n−1
subject to  π is a permutation (each job assigned exactly once)

Equivalently, in bipartite graph terms: build a complete bipartite graph Kn,n with left vertices (workers), right vertices (jobs), and edge weights c[i][j]. We want a minimum-weight perfect matching.

Why Not Brute Force?

There are n! possible assignments. For n = 20, that is already over 2 × 1018. Even for n = 12, brute force is impractical in a contest. We need something far better.

Real-World and Contest Motivation

Convention: Throughout this post we solve the minimisation version. To solve maximisation, negate all costs and negate the final answer.

Hungarian Algorithm Overview

The Hungarian algorithm is built on the theory of linear programming duality applied to the assignment problem. The key ideas are potential functions, reduced costs, and augmenting paths.

Dual Variables (Potentials)

We introduce dual variables: a value u[i] for each worker i and v[j] for each job j. The dual feasibility condition requires:

u[i] + v[j] <= c[i][j]   for all i, j

The reduced cost of edge (i, j) is defined as:

c'[i][j] = c[i][j] − u[i] − v[j]  >= 0

An edge is tight (or admissible) when its reduced cost is exactly zero. The dual objective is to maximise Σ u[i] + Σ v[j], which gives a lower bound on the optimal primal cost.

Complementary Slackness

By LP duality, a primal assignment and dual potentials are both optimal if and only if every assigned edge is tight:

For each assigned pair (i, π(i)):  c[i][π(i)] = u[i] + v[π(i)]

So the algorithm's strategy is: maintain dual-feasible potentials, and grow a matching using only tight edges. When stuck, adjust potentials to create new tight edges.

High-Level Algorithm

The algorithm processes workers one at a time. For each worker i:

  1. Run a shortest-path search (Dijkstra-like) on the reduced cost graph to find an augmenting path from worker i to an unmatched job.
  2. Use the shortest-path distances to update potentials, maintaining dual feasibility while making more edges tight.
  3. Augment the matching along the found path, increasing the matching size by one.

After processing all n workers, we have a perfect matching on tight edges — which by complementary slackness is optimal.

Key insight: The algorithm never backtracks on matched workers. Each worker is matched exactly once, and the potentials are adjusted so that prior matches remain on tight edges.

Step-by-Step Walkthrough

Let us trace through a complete 4×4 example. Our cost matrix is:

        Job 0   Job 1   Job 2   Job 3
Worker 0:   9       2       7       8
Worker 1:   6       4       3       7
Worker 2:   5       8       1       8
Worker 3:   7       6       9       4

Initialisation

Set u[i] = 0 for all workers. Set v[j] = min over all i of c[i][j] for each job:

u = [0, 0, 0, 0]
v = [min(9,6,5,7), min(2,4,8,6), min(7,3,1,9), min(8,7,8,4)]
v = [5, 2, 1, 4]

The reduced cost matrix c'[i][j] = c[i][j] − u[i] − v[j]:

        Job 0   Job 1   Job 2   Job 3
Worker 0:   4       0       6       4
Worker 1:   1       2       2       3
Worker 2:   0       6       0       4
Worker 3:   2       4       8       0

Tight edges (reduced cost = 0): (0,1), (2,0), (2,2), (3,3).

Iteration 1 — Worker 0

We need to find an augmenting path from Worker 0 to an unmatched job using shortest paths on reduced costs. The shortest path from Worker 0 leads to Job 1 with reduced cost 0 (it is already tight). Job 1 is unmatched, so we directly match:

Match: Worker 0 → Job 1
Matching: {0→1}

Iteration 2 — Worker 1

Run shortest-path search from Worker 1. The minimum reduced costs from Worker 1 are: Job 0 = 1, Job 1 = 2, Job 2 = 2, Job 3 = 3. The nearest unmatched job is Job 0 (cost 1) or Job 2 (cost 2). We reach Job 0 first with distance 1.

Update potentials by the shortest path distances, then augment:

Match: Worker 1 → Job 0
Matching: {0→1, 1→0}

Iteration 3 — Worker 2

Shortest-path search from Worker 2. Direct reduced costs: Job 0 = 0, Job 2 = 0, Job 1 = 6, Job 3 = 4. Job 0 has distance 0, but Job 0 is matched to Worker 1. So we explore through Worker 1: from Worker 1, we can reach Job 2 with additional reduced cost 2 (after potential updates). The alternating path is:

Worker 2 → Job 0 (tight) → Worker 1 (matched) → Job 2 (reduced cost 2)

But Job 2 with total distance 2 competes with Worker 2 → Job 2 directly at distance 0. Since Job 2 is unmatched, we take the direct route:

Match: Worker 2 → Job 2
Matching: {0→1, 1→0, 2→2}

Iteration 4 — Worker 3

Direct reduced costs from Worker 3: Job 0 = 2, Job 1 = 4, Job 2 = 8, Job 3 = 0. Job 3 is tight and unmatched.

Match: Worker 3 → Job 3
Matching: {0→1, 1→0, 2→2, 3→3}

Final Result

Worker 0 → Job 1:  cost 2
Worker 1 → Job 0:  cost 6
Worker 2 → Job 2:  cost 1
Worker 3 → Job 3:  cost 4
───────────────────────
Total cost:        13

You can verify that no other assignment yields a lower total. The dual objective Σ u[i] + Σ v[j] = 0+0+0+0 + 5+2+1+4 = 12 after initialisation, and after potential updates it reaches 13, certifying optimality.

O(n³) Implementation

Below is a complete, contest-ready C++ implementation. It uses the Jonker–Volgenant variant of the Hungarian algorithm, which is clean and runs in O(n³) without explicitly building a graph. The algorithm processes one worker per outer iteration, running a shortest-path (Dijkstra-like) relaxation over jobs.

Implementation note: We use 1-indexed workers and jobs internally. The cost matrix is 0-indexed on input. The array p stores which worker is matched to each job (0 = unmatched), and ans maps each worker to its assigned job.
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
const ll INF = 1e18;

// Solves min-cost assignment on an n x n cost matrix.
// Returns {min_cost, assignment} where assignment[i] = job assigned to worker i (0-indexed).
pair<ll, vector<int>> hungarian(const vector<vector<ll>>& cost) {
    int n = (int)cost.size();
    // u[i]: potential for worker i (1-indexed, u[0] unused)
    // v[j]: potential for job j (1-indexed, v[0] unused)
    vector<ll> u(n + 1, 0), v(n + 1, 0);
    // p[j] = worker matched to job j (1-indexed, 0 = unmatched)
    vector<int> p(n + 1, 0);
    // ans[i] = job assigned to worker i (0-indexed result)
    vector<int> ans(n + 1, 0);

    for (int i = 1; i <= n; i++) {
        // We are assigning worker i.
        // dist[j] = shortest reduced-cost distance to reach job j.
        vector<ll> dist(n + 1, INF);
        vector<bool> used(n + 1, false);
        vector<int> prev(n + 1, 0); // prev[j] = previous job on augmenting path

        // Dummy job 0 is matched to worker i to start the search.
        p[0] = i;
        int j0 = 0; // current job in the search
        dist[0] = 0;

        // Dijkstra-like shortest path over jobs.
        do {
            used[j0] = true;
            int w = p[j0]; // worker matched to current job
            ll delta = INF;
            int j1 = -1;   // next job to add to the tree

            for (int j = 1; j <= n; j++) {
                if (used[j]) continue;
                // Reduced cost of assigning worker w to job j.
                ll rd = cost[w - 1][j - 1] - u[w] - v[j];
                if (dist[j0] + rd < dist[j]) {
                    dist[j] = dist[j0] + rd;
                    prev[j] = j0;
                }
                if (dist[j] < delta) {
                    delta = dist[j];
                    j1 = j;
                }
            }

            // Update potentials.
            for (int j = 0; j <= n; j++) {
                if (used[j]) {
                    u[p[j]] += delta;
                    v[j] -= delta;
                } else {
                    dist[j] -= delta;
                }
            }

            j0 = j1;
        } while (p[j0] != 0);
        // j0 is now an unmatched job. Trace back and augment.

        // Augment along the path.
        while (j0 != 0) {
            int j1 = prev[j0];
            p[j0] = p[j1];
            j0 = j1;
        }
    }

    // Extract the answer.
    ll total = 0;
    for (int j = 1; j <= n; j++) {
        if (p[j] != 0) {
            ans[p[j]] = j;
        }
    }
    for (int i = 1; i <= n; i++) {
        total += cost[i - 1][ans[i] - 1];
    }

    // Convert to 0-indexed.
    vector<int> result(n);
    for (int i = 1; i <= n; i++) {
        result[i - 1] = ans[i] - 1;
    }
    return {total, result};
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n;
    cin >> n;
    vector<vector<ll>> cost(n, vector<ll>(n));
    for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
            cin >> cost[i][j];

    auto [total, assignment] = hungarian(cost);
    cout << total << "\n";
    for (int i = 0; i < n; i++)
        cout << i << " -> " << assignment[i] << "\n";
    return 0;
}

Code Walkthrough

Let us break down the key parts of the implementation:

Outer loop (worker i): For each worker, we find the shortest augmenting path from that worker to an unmatched job. The "dummy job 0" trick avoids special-casing the start of the search — we temporarily match worker i to job 0.

Inner Dijkstra loop: In each step, we pick the unvisited job j1 with the smallest distance. For every unvisited job j, we try to relax its distance through the worker currently matched to j0. This is the core O(n²) per worker.

Potential update: After finding the minimum delta, we shift potentials so that: (a) visited edges remain tight, and (b) unvisited distances decrease by delta. This is what makes the algorithm work without an explicit priority queue.

Augmentation: Once we reach an unmatched job (p[j0] == 0), we trace back through the prev array and flip the matching along the alternating path.

Contest tip: This implementation handles negative costs correctly. If you need maximisation, negate all entries and negate the returned total cost.

Correctness and Complexity

Why It Works

The correctness rests on two invariants maintained throughout the algorithm:

  1. Dual feasibility: After every potential update, we still have c[i][j] − u[i] − v[j] ≥ 0 for all i, j. The potential update step is carefully designed so that reduced costs of unvisited edges do not become negative.
  2. Complementary slackness: Every matched edge (i, p[i]) satisfies c[i][p[i]] = u[i] + v[p[i]], i.e., it is tight. This holds because we only augment along tight edges (after potentials are updated, the augmenting path consists entirely of zero reduced-cost edges).

By LP duality, when we have a feasible dual solution and a primal matching satisfying complementary slackness, both are optimal. After all n workers are matched, we have a perfect matching on tight edges, certifying optimality.

O(n³) Time Analysis

The analysis is straightforward once you see the structure:

This gives n × n × n = O(n³) total operations. The constant factor is small — typically around 2–3 operations per inner-loop step — making it very fast in practice.

Space Complexity

We use O(n²) space for the cost matrix and O(n) space for the auxiliary arrays (u, v, p, dist, used, prev). Total: O(n²).

Comparison with Alternatives

MethodTimePractical n
Brute force (all permutations)O(n! · n)n ≤ 10
Bitmask DPO(n² · 2ⁿ)n ≤ 20
Hungarian algorithmO(n³)n ≤ 1000–3000
Min-cost max-flow (MCMF)O(n³) or worsen ≤ 500–1000

The Hungarian algorithm is almost always faster than MCMF for dense bipartite matching due to lower constant factors and no graph construction overhead.

Rectangular Assignments

In many problems the matrix is not square: you might have r workers and c jobs where r ≠ c. The standard Hungarian algorithm requires a square matrix, but handling this is straightforward.

Padding to a Square Matrix

If r < c (fewer workers than jobs), add c − r dummy workers with cost 0 to every job. If r > c (more workers than jobs), add dummy jobs. After solving, ignore assignments involving dummy rows or columns.

// Solve rectangular assignment: r workers, c jobs.
// Returns {min_cost, assignment} where assignment[i] = job for worker i,
// or -1 if worker i is unassigned.
pair<ll, vector<int>> rectangular_hungarian(
    const vector<vector<ll>>& cost, int r, int c
) {
    int n = max(r, c);
    vector<vector<ll>> padded(n, vector<ll>(n, 0));
    for (int i = 0; i < r; i++)
        for (int j = 0; j < c; j++)
            padded[i][j] = cost[i][j];

    auto [total, assignment] = hungarian(padded);

    // Filter out dummy assignments.
    vector<int> result(r, -1);
    ll real_cost = 0;
    for (int i = 0; i < r; i++) {
        if (assignment[i] < c) {
            result[i] = assignment[i];
            real_cost += cost[i][assignment[i]];
        }
    }
    return {real_cost, result};
}

Cost of Padding

Padding increases the matrix from r × c to n × n where n = max(r, c). The time becomes O(n³) with n = max(r, c). For highly rectangular matrices (e.g., 10 × 1000), this is wasteful. In such cases, consider a min-cost max-flow approach which can be more efficient when the number of edges is sparse.

Minimisation vs. maximisation with dummies: When padding for minimisation, use cost 0 for dummy entries (they won't affect the optimal real cost). For maximisation, also use 0 for dummy entries (after negation, they become 0 and won't interfere).

Partial Assignments

Sometimes you need to assign at most k workers (not all n). Add n − k dummy jobs with cost 0. Workers assigned to dummy jobs are effectively "unassigned." This transforms a partial assignment into a full one on a larger square matrix.

Practice Problems

Here are classic competitive programming problems that require or benefit from the Hungarian algorithm:

Direct Applications

Geometric Variants

Reductions to Assignment

Tips for Contest Use

Checklist before coding Hungarian in a contest:
  • Is n ≤ ~1000? If n ≤ 20, bitmask DP is simpler and fast enough.
  • Is it a minimisation or maximisation problem? Negate costs for max.
  • Is the matrix rectangular? Pad with zeros to make it square.
  • Are costs potentially negative? This implementation handles negatives correctly.
  • Do you need the assignment itself or just the cost? Both are returned.