Hungarian Algorithm Walkthrough
The Hungarian algorithm (also known as the Kuhn–Munkres algorithm) is the gold-standard method for solving the assignment problem in O(n³) time. Given n workers and n jobs with a cost matrix, it finds the minimum-cost one-to-one assignment. This post walks through every detail: the theory behind potential functions and reduced costs, a complete hand-traced 4×4 example, a contest-ready C++ implementation, and the analysis that proves both correctness and the cubic time bound.
The Assignment Problem
The assignment problem is one of the most fundamental problems in
combinatorial optimization. It asks: given n workers and n jobs, with
a cost c[i][j] for assigning worker i to job j, find a
one-to-one assignment of workers to jobs that minimises the total cost.
Formal Definition
We are given an n × n cost matrix C where entry c[i][j] is the cost of assigning worker i to job j. We want a permutation π of {0, 1, …, n−1} that minimises:
minimise Σ c[i][π(i)] for i = 0 … n−1
subject to π is a permutation (each job assigned exactly once)
Equivalently, in bipartite graph terms: build a complete bipartite graph Kn,n with left vertices (workers), right vertices (jobs), and edge weights c[i][j]. We want a minimum-weight perfect matching.
Why Not Brute Force?
There are n! possible assignments. For n = 20, that is already over 2 × 1018. Even for n = 12, brute force is impractical in a contest. We need something far better.
Real-World and Contest Motivation
- Task scheduling: Assign n machines to n tasks, minimising total processing time.
- Bitmask DP baseline: The O(n² · 2ⁿ) bitmask DP solves assignment for small n. Hungarian extends to n up to ~1000.
- Minimum-cost bipartite matching: Many graph problems reduce to this.
- Geometry: Minimum-cost point matching in 2D (e.g., ICPC problems).
Hungarian Algorithm Overview
The Hungarian algorithm is built on the theory of linear programming duality applied to the assignment problem. The key ideas are potential functions, reduced costs, and augmenting paths.
Dual Variables (Potentials)
We introduce dual variables: a value u[i] for each worker i and
v[j] for each job j. The dual feasibility condition
requires:
u[i] + v[j] <= c[i][j] for all i, j
The reduced cost of edge (i, j) is defined as:
c'[i][j] = c[i][j] − u[i] − v[j] >= 0
An edge is tight (or admissible) when its reduced cost is exactly zero. The dual objective is to maximise Σ u[i] + Σ v[j], which gives a lower bound on the optimal primal cost.
Complementary Slackness
By LP duality, a primal assignment and dual potentials are both optimal if and only if every assigned edge is tight:
For each assigned pair (i, π(i)): c[i][π(i)] = u[i] + v[π(i)]
So the algorithm's strategy is: maintain dual-feasible potentials, and grow a matching using only tight edges. When stuck, adjust potentials to create new tight edges.
High-Level Algorithm
The algorithm processes workers one at a time. For each worker i:
- Run a shortest-path search (Dijkstra-like) on the reduced cost graph to find an augmenting path from worker i to an unmatched job.
- Use the shortest-path distances to update potentials, maintaining dual feasibility while making more edges tight.
- Augment the matching along the found path, increasing the matching size by one.
After processing all n workers, we have a perfect matching on tight edges — which by complementary slackness is optimal.
Step-by-Step Walkthrough
Let us trace through a complete 4×4 example. Our cost matrix is:
Job 0 Job 1 Job 2 Job 3
Worker 0: 9 2 7 8
Worker 1: 6 4 3 7
Worker 2: 5 8 1 8
Worker 3: 7 6 9 4
Initialisation
Set u[i] = 0 for all workers. Set v[j] = min over all i of c[i][j] for each job:
u = [0, 0, 0, 0]
v = [min(9,6,5,7), min(2,4,8,6), min(7,3,1,9), min(8,7,8,4)]
v = [5, 2, 1, 4]
The reduced cost matrix c'[i][j] = c[i][j] − u[i] − v[j]:
Job 0 Job 1 Job 2 Job 3
Worker 0: 4 0 6 4
Worker 1: 1 2 2 3
Worker 2: 0 6 0 4
Worker 3: 2 4 8 0
Tight edges (reduced cost = 0): (0,1), (2,0), (2,2), (3,3).
Iteration 1 — Worker 0
We need to find an augmenting path from Worker 0 to an unmatched job using shortest paths on reduced costs. The shortest path from Worker 0 leads to Job 1 with reduced cost 0 (it is already tight). Job 1 is unmatched, so we directly match:
Match: Worker 0 → Job 1
Matching: {0→1}
Iteration 2 — Worker 1
Run shortest-path search from Worker 1. The minimum reduced costs from Worker 1 are: Job 0 = 1, Job 1 = 2, Job 2 = 2, Job 3 = 3. The nearest unmatched job is Job 0 (cost 1) or Job 2 (cost 2). We reach Job 0 first with distance 1.
Update potentials by the shortest path distances, then augment:
Match: Worker 1 → Job 0
Matching: {0→1, 1→0}
Iteration 3 — Worker 2
Shortest-path search from Worker 2. Direct reduced costs: Job 0 = 0, Job 2 = 0, Job 1 = 6, Job 3 = 4. Job 0 has distance 0, but Job 0 is matched to Worker 1. So we explore through Worker 1: from Worker 1, we can reach Job 2 with additional reduced cost 2 (after potential updates). The alternating path is:
Worker 2 → Job 0 (tight) → Worker 1 (matched) → Job 2 (reduced cost 2)
But Job 2 with total distance 2 competes with Worker 2 → Job 2 directly at distance 0. Since Job 2 is unmatched, we take the direct route:
Match: Worker 2 → Job 2
Matching: {0→1, 1→0, 2→2}
Iteration 4 — Worker 3
Direct reduced costs from Worker 3: Job 0 = 2, Job 1 = 4, Job 2 = 8, Job 3 = 0. Job 3 is tight and unmatched.
Match: Worker 3 → Job 3
Matching: {0→1, 1→0, 2→2, 3→3}
Final Result
Worker 0 → Job 1: cost 2
Worker 1 → Job 0: cost 6
Worker 2 → Job 2: cost 1
Worker 3 → Job 3: cost 4
───────────────────────
Total cost: 13
You can verify that no other assignment yields a lower total. The dual objective Σ u[i] + Σ v[j] = 0+0+0+0 + 5+2+1+4 = 12 after initialisation, and after potential updates it reaches 13, certifying optimality.
O(n³) Implementation
Below is a complete, contest-ready C++ implementation. It uses the Jonker–Volgenant variant of the Hungarian algorithm, which is clean and runs in O(n³) without explicitly building a graph. The algorithm processes one worker per outer iteration, running a shortest-path (Dijkstra-like) relaxation over jobs.
p stores which worker
is matched to each job (0 = unmatched), and ans maps each worker to its
assigned job.
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
const ll INF = 1e18;
// Solves min-cost assignment on an n x n cost matrix.
// Returns {min_cost, assignment} where assignment[i] = job assigned to worker i (0-indexed).
pair<ll, vector<int>> hungarian(const vector<vector<ll>>& cost) {
int n = (int)cost.size();
// u[i]: potential for worker i (1-indexed, u[0] unused)
// v[j]: potential for job j (1-indexed, v[0] unused)
vector<ll> u(n + 1, 0), v(n + 1, 0);
// p[j] = worker matched to job j (1-indexed, 0 = unmatched)
vector<int> p(n + 1, 0);
// ans[i] = job assigned to worker i (0-indexed result)
vector<int> ans(n + 1, 0);
for (int i = 1; i <= n; i++) {
// We are assigning worker i.
// dist[j] = shortest reduced-cost distance to reach job j.
vector<ll> dist(n + 1, INF);
vector<bool> used(n + 1, false);
vector<int> prev(n + 1, 0); // prev[j] = previous job on augmenting path
// Dummy job 0 is matched to worker i to start the search.
p[0] = i;
int j0 = 0; // current job in the search
dist[0] = 0;
// Dijkstra-like shortest path over jobs.
do {
used[j0] = true;
int w = p[j0]; // worker matched to current job
ll delta = INF;
int j1 = -1; // next job to add to the tree
for (int j = 1; j <= n; j++) {
if (used[j]) continue;
// Reduced cost of assigning worker w to job j.
ll rd = cost[w - 1][j - 1] - u[w] - v[j];
if (dist[j0] + rd < dist[j]) {
dist[j] = dist[j0] + rd;
prev[j] = j0;
}
if (dist[j] < delta) {
delta = dist[j];
j1 = j;
}
}
// Update potentials.
for (int j = 0; j <= n; j++) {
if (used[j]) {
u[p[j]] += delta;
v[j] -= delta;
} else {
dist[j] -= delta;
}
}
j0 = j1;
} while (p[j0] != 0);
// j0 is now an unmatched job. Trace back and augment.
// Augment along the path.
while (j0 != 0) {
int j1 = prev[j0];
p[j0] = p[j1];
j0 = j1;
}
}
// Extract the answer.
ll total = 0;
for (int j = 1; j <= n; j++) {
if (p[j] != 0) {
ans[p[j]] = j;
}
}
for (int i = 1; i <= n; i++) {
total += cost[i - 1][ans[i] - 1];
}
// Convert to 0-indexed.
vector<int> result(n);
for (int i = 1; i <= n; i++) {
result[i - 1] = ans[i] - 1;
}
return {total, result};
}
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n;
cin >> n;
vector<vector<ll>> cost(n, vector<ll>(n));
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
cin >> cost[i][j];
auto [total, assignment] = hungarian(cost);
cout << total << "\n";
for (int i = 0; i < n; i++)
cout << i << " -> " << assignment[i] << "\n";
return 0;
}
Code Walkthrough
Let us break down the key parts of the implementation:
Outer loop (worker i): For each worker, we find the shortest augmenting path from that worker to an unmatched job. The "dummy job 0" trick avoids special-casing the start of the search — we temporarily match worker i to job 0.
Inner Dijkstra loop: In each step, we pick the unvisited job j1 with the smallest distance. For every unvisited job j, we try to relax its distance through the worker currently matched to j0. This is the core O(n²) per worker.
Potential update: After finding the minimum delta, we shift potentials so that: (a) visited edges remain tight, and (b) unvisited distances decrease by delta. This is what makes the algorithm work without an explicit priority queue.
Augmentation: Once we reach an unmatched job (p[j0] == 0), we trace
back through the prev array and flip the matching along the alternating
path.
Correctness and Complexity
Why It Works
The correctness rests on two invariants maintained throughout the algorithm:
- Dual feasibility: After every potential update, we still have c[i][j] − u[i] − v[j] ≥ 0 for all i, j. The potential update step is carefully designed so that reduced costs of unvisited edges do not become negative.
- Complementary slackness: Every matched edge (i, p[i]) satisfies c[i][p[i]] = u[i] + v[p[i]], i.e., it is tight. This holds because we only augment along tight edges (after potentials are updated, the augmenting path consists entirely of zero reduced-cost edges).
By LP duality, when we have a feasible dual solution and a primal matching satisfying complementary slackness, both are optimal. After all n workers are matched, we have a perfect matching on tight edges, certifying optimality.
O(n³) Time Analysis
The analysis is straightforward once you see the structure:
- The outer loop runs n times (once per worker).
- For each worker, the inner Dijkstra-like loop runs at most n iterations (each iteration marks one job as "used").
- Each iteration scans all n jobs to find the minimum distance and perform relaxations.
This gives n × n × n = O(n³) total operations. The constant factor is small — typically around 2–3 operations per inner-loop step — making it very fast in practice.
Space Complexity
We use O(n²) space for the cost matrix and O(n) space for the auxiliary arrays (u, v, p, dist, used, prev). Total: O(n²).
Comparison with Alternatives
| Method | Time | Practical n |
|---|---|---|
| Brute force (all permutations) | O(n! · n) | n ≤ 10 |
| Bitmask DP | O(n² · 2ⁿ) | n ≤ 20 |
| Hungarian algorithm | O(n³) | n ≤ 1000–3000 |
| Min-cost max-flow (MCMF) | O(n³) or worse | n ≤ 500–1000 |
The Hungarian algorithm is almost always faster than MCMF for dense bipartite matching due to lower constant factors and no graph construction overhead.
Rectangular Assignments
In many problems the matrix is not square: you might have r workers and c jobs where r ≠ c. The standard Hungarian algorithm requires a square matrix, but handling this is straightforward.
Padding to a Square Matrix
If r < c (fewer workers than jobs), add c − r dummy workers with
cost 0 to every job. If r > c (more workers than jobs), add dummy jobs. After
solving, ignore assignments involving dummy rows or columns.
// Solve rectangular assignment: r workers, c jobs.
// Returns {min_cost, assignment} where assignment[i] = job for worker i,
// or -1 if worker i is unassigned.
pair<ll, vector<int>> rectangular_hungarian(
const vector<vector<ll>>& cost, int r, int c
) {
int n = max(r, c);
vector<vector<ll>> padded(n, vector<ll>(n, 0));
for (int i = 0; i < r; i++)
for (int j = 0; j < c; j++)
padded[i][j] = cost[i][j];
auto [total, assignment] = hungarian(padded);
// Filter out dummy assignments.
vector<int> result(r, -1);
ll real_cost = 0;
for (int i = 0; i < r; i++) {
if (assignment[i] < c) {
result[i] = assignment[i];
real_cost += cost[i][assignment[i]];
}
}
return {real_cost, result};
}
Cost of Padding
Padding increases the matrix from r × c to n × n where n = max(r, c). The time becomes O(n³) with n = max(r, c). For highly rectangular matrices (e.g., 10 × 1000), this is wasteful. In such cases, consider a min-cost max-flow approach which can be more efficient when the number of edges is sparse.
Partial Assignments
Sometimes you need to assign at most k workers (not all n). Add n − k dummy jobs with cost 0. Workers assigned to dummy jobs are effectively "unassigned." This transforms a partial assignment into a full one on a larger square matrix.
Practice Problems
Here are classic competitive programming problems that require or benefit from the Hungarian algorithm:
Direct Applications
- CSES — Task Assignment — The canonical problem: n employees, n tasks, minimise total cost. Direct application of Hungarian. A perfect first problem to test your implementation.
- SPOJ — BABY — Assign babies to cribs with given preference costs. Straightforward assignment problem.
- Library Checker — Assignment Problem — Minimise cost assignment with up to n = 500. Great for benchmarking your implementation.
Geometric Variants
- Codeforces 1307G — Cow and Exercise — Involves minimum cost flow on a bipartite structure. Understanding Hungarian duality helps build intuition for the LP relaxation approach.
- Minimum-cost point matching: Given two sets of n points in 2D, match them one-to-one to minimise total Euclidean (or Manhattan) distance. Build the cost matrix and run Hungarian. Common in ICPC regionals.
Reductions to Assignment
- Minimum-weight edge cover in bipartite graphs: Can be reduced to assignment by careful construction.
- Bottleneck assignment: Minimise the maximum cost edge in the matching. Binary search on the answer + feasibility check, or use the connection to the assignment LP.
- k-best assignments: Find the k lowest-cost assignments. Murty's algorithm partitions the search space and calls Hungarian as a subroutine.
Tips for Contest Use
- Is n ≤ ~1000? If n ≤ 20, bitmask DP is simpler and fast enough.
- Is it a minimisation or maximisation problem? Negate costs for max.
- Is the matrix rectangular? Pad with zeros to make it square.
- Are costs potentially negative? This implementation handles negatives correctly.
- Do you need the assignment itself or just the cost? Both are returned.