Problem Classification — Pattern Recognition in Competitive Programming
Why Classification Matters
In a 5-hour contest with 8–12 problems, the first 60 seconds you spend on each problem are the most valuable. That minute determines whether you spend the next 30 minutes heading in the right direction or chasing a dead end. Classification — the ability to rapidly identify what type of problem you're facing — is the single most important meta-skill in competitive programming.
Consider two contestants reading the same problem:
- Contestant A reads the statement, thinks "this looks hard," and starts experimenting with brute-force ideas, hoping something clicks.
- Contestant B reads the first paragraph, notes that N ≤ 20, sees the phrase "subset of items," and immediately thinks bitmask DP. Within 90 seconds, she's coding a solution.
Both have the same algorithmic knowledge. The difference is purely pattern recognition — the ability to map a problem statement to a known technique before you start thinking about the algorithm in detail.
This skill compounds: the faster you classify a problem, the more time you have for the genuinely hard problems that resist classification. Virtually every problem on Codeforces Div 2 A–D can be classified within two minutes by someone who has internalized the patterns. The editorial-difficulty problems (E, F) are where creative thinking begins — but even those usually decompose into classifiable sub-problems.
The Major Problem Categories
Every competitive programming problem falls into one or more of the following families. For each, I list the key signals that should fire when you read the statement:
1. Greedy
greedy Problems where making the locally optimal choice at each step leads to a globally optimal solution.
- Signals: "maximize/minimize X subject to simple constraints," sortable inputs, exchange argument applies, optimal substructure is obvious without overlapping sub-problems.
- Common patterns: Activity selection, fractional knapsack, Huffman coding, scheduling by deadline, sweep-line greedy.
- Warning: If you can't prove the greedy choice property in your head within 60 seconds, it's probably DP.
2. Dynamic Programming
dp Optimization or counting over overlapping sub-problems with optimal substructure.
- Signals: "count the number of ways," "minimum cost to …," small state space (N ≤ 5000 for O(N²), or N ≤ 20 for bitmask), DAG structure, "choose a subsequence."
- Sub-types: Knapsack, LIS/LCS, interval DP, digit DP, bitmask DP, tree DP, profile DP, DP on broken profile, SOS DP.
3. Graphs
graphs Anything with pairwise relationships, reachability, or flow.
- Signals: "cities and roads," "connected components," "shortest path," "network," "dependencies," bipartite structure, matching.
- Sub-types: BFS/DFS, Dijkstra, Bellman-Ford, Floyd-Warshall, MST (Kruskal/Prim), topological sort, SCC (Tarjan/Kosaraju), bipartite matching, max-flow/min-cut, 2-SAT.
4. Data Structures
data structures Problems where the algorithm is simple but requires efficient querying or updating.
- Signals: "Q queries," "update and query," "range [l, r]," large N with online queries, "kth element."
- Sub-types: Segment trees (with lazy propagation), BIT/Fenwick, sparse table, merge sort tree, treap/splay, DSU, HLD, centroid decomposition, LCT.
5. Math & Number Theory
math Problems requiring mathematical insight or number-theoretic algorithms.
- Signals: "modulo 10⁹+7," "GCD/LCM," "prime," "divisors," "modular inverse," large numbers (up to 10¹⁸).
- Sub-types: Sieve of Eratosthenes, extended Euclidean, CRT, Euler's totient, modular exponentiation, Miller-Rabin, Pollard's rho, FFT/NTT for polynomial multiplication.
6. Strings
strings Pattern matching, substring properties, and text processing.
- Signals: Input is a string, "substring," "palindrome," "lexicographically smallest," "occurrences."
- Sub-types: KMP, Z-function, suffix array + LCP, suffix automaton, Aho-Corasick, Manacher, hashing.
7. Combinatorics
combinatorics Counting arrangements, selections, and configurations.
- Signals: "number of ways," "count distinct," "mod 10⁹+7," "arrangements," "permutations."
- Sub-types: Binomial coefficients, inclusion-exclusion, Burnside/Pólya, Catalan numbers, Stirling numbers, generating functions, PIE.
8. Geometry
geometry Coordinate geometry, convex hull, intersections.
- Signals: Coordinates (x, y), "distance," "area," "convex hull," "closest pair," "line intersection."
- Sub-types: Convex hull (Graham/Andrew), line sweep, rotating calipers, half-plane intersection, Voronoi, Minkowski sum.
9. Game Theory
game theory Two-player games with perfect information.
- Signals: "Alice and Bob take turns," "optimal play," "who wins," "first/second player."
- Sub-types: Sprague-Grundy, Nim variants, minimax, game on graph.
10. Constructive Algorithms
constructive Build an object satisfying given constraints.
- Signals: "construct," "find any," "print a valid," "if impossible print -1," "rearrange."
- Key skill: These often require ad-hoc insight. Look for invariants: parity, degree sequences, prefix sums.
Reading the Constraints
Before you even understand what a problem is asking, look at the constraints. The allowed input size is the strongest signal for the intended time complexity, and therefore the algorithm family.
| Constraint | Target Complexity | Likely Approach |
|---|---|---|
| N ≤ 10 | O(N!) or O(2N · N) | Brute-force permutations, backtracking |
| N ≤ 20 | O(2N) or O(2N · N) | Bitmask DP, meet-in-the-middle |
| N ≤ 40 | O(2N/2) | Meet-in-the-middle (split into halves) |
| N ≤ 500 | O(N³) | Floyd-Warshall, interval DP, matrix operations |
| N ≤ 5,000 | O(N²) | Quadratic DP, pairwise comparisons |
| N ≤ 105 | O(N log N) | Sorting, segment tree, binary search, divide & conquer |
| N ≤ 106 | O(N) or O(N log N) tight | Linear algorithms, two-pointer, sieve, BFS/DFS |
| N ≤ 107 | O(N) | Linear sieve, prefix sums, simple iteration |
| N ≤ 109 | O(√N) or O(log N) | Math formula, binary search on answer, sqrt decomposition on value |
| N ≤ 1018 | O(log N) or O(√N) with tricks | Matrix exponentiation, binary lifting, math formula, Pollard's rho |
Constraints also encode special structure:
- N ≤ 26 or values in
[a-z]→ bitmask over alphabet. - Values ≤ 10⁶ with N large → frequency array, sieve, or harmonic series tricks (O(V log V)).
- "Tree with N nodes" → DFS/Euler tour, HLD, centroid decomposition.
- Q ≤ 10⁵ queries on array of 10⁵ → segment tree or BIT with O(log N) per query.
- Q ≤ 10⁵ queries, offline allowed → Mo's algorithm, offline + sweep, CDQ divide & conquer.
Multiple Constraint Variables
When a problem has multiple parameters (N, M, K, Q), the intended complexity usually involves all of them. For instance:
- N ≤ 10⁵, Q ≤ 10⁵ → O((N + Q) log N) per segment tree
- N ≤ 10³, M ≤ 10³ → O(NM) grid DP or BFS
- N ≤ 10⁵, K ≤ 20 → O(NK) or O(N · 2K) depending on whether K acts as states or bitmask
- N ≤ 2 × 10⁵, sum of N over all test cases ≤ 2 × 10⁵ → O(N) per test case is fine
Keyword Signals in Problem Statements
Problem setters unconsciously (or intentionally) leave linguistic fingerprints that hint at the solution type. Here's a comprehensive mapping from keywords to algorithms:
Optimization Keywords
- "minimum cost" → shortest path, min-cost max-flow, DP
- "maximum profit" → DP, greedy, max-flow
- "minimum number of operations" → BFS on state space, greedy
- "minimize the maximum" → binary search on answer
- "maximize the minimum" → binary search on answer
Counting Keywords
- "number of ways" → DP, combinatorics
- "count pairs / triples" → two-pointer, sorting, prefix sums, FFT
- "modulo 10⁹+7" → combinatorics or DP (result is large)
- "distinct subsequences" → DP with deduplication
- "count subarrays with property" → two-pointer, contribution technique
Graph Keywords
- "cities and roads" → graph (often shortest path or MST)
- "reachable" → BFS/DFS, transitive closure
- "connected components" → DSU, BFS/DFS
- "dependencies" → topological sort, DAG DP
- "bipartite" / "two groups" → 2-coloring, matching
- "network flow" / "capacity" → max-flow, min-cut
Construction Keywords
- "construct" / "find any" → constructive, often greedy-based
- "rearrange" → sorting, greedy construction
- "if impossible, print -1" → look for necessary conditions first
- "lexicographically smallest" → greedy front-to-back
- "for all permutations" → constructive or math
String Keywords
- "palindrome" → Manacher, DP, hashing
- "substring matching" → KMP, Z-function, suffix array
- "lexicographic order" → suffix array, greedy
- "period" / "border" → KMP failure function
- "common substring" → suffix array + LCP, DP
Game Theory Keywords
- "Alice and Bob" → game theory, minimax
- "optimal play" → Sprague-Grundy, DP
- "take turns removing" → Nim, Sprague-Grundy
- "first player wins" → parity, Grundy values
- "stones in piles" → Nim (XOR of pile sizes)
The "Binary Search on Answer" Signal
One of the most powerful classification shortcuts: if a problem asks you to minimize the maximum or maximize the minimum of something, there's an 80%+ chance the intended approach is binary search on the answer value, combined with a greedy check function.
// Classic pattern: "minimize the maximum distance between K facilities"
// Binary search on the answer, check with greedy
bool canAchieve(int maxDist, vector<int>& positions, int K) {
int count = 1, last = positions[0];
for (int i = 1; i < (int)positions.size(); i++) {
if (positions[i] - last >= maxDist) {
count++;
last = positions[i];
}
}
return count >= K;
}
int solve(vector<int>& positions, int K) {
sort(positions.begin(), positions.end());
int lo = 1, hi = positions.back() - positions.front(), ans = hi;
while (lo <= hi) {
int mid = lo + (hi - lo) / 2;
if (canAchieve(mid, positions, K)) {
ans = mid;
lo = mid + 1; // try larger minimum distance
} else {
hi = mid - 1;
}
}
return ans;
}
Cross-Category Problems
The hardest problems in competitive programming rarely belong to a single category. Division 1 C/D problems and above typically combine two or three techniques. The key is decomposition: break the problem into layers, classify each layer independently.
Common Combinations
| Combination | Example Pattern | Decomposition Strategy |
|---|---|---|
| graph + dp | Shortest path with state (e.g., "at most K edges") | Model as graph where nodes encode (vertex, DP state), run Dijkstra/BFS on product graph |
| dp + data structures | DP transitions need range-min/max queries | Write the DP recurrence first, then optimize transitions with segment tree or monotonic queue |
| math + dp | Counting with modular arithmetic over complex states | Derive the combinatorial formula, then compute it via DP (e.g., inclusion-exclusion as DP) |
| graph + data structures | Tree queries with updates | Flatten tree with Euler tour, apply segment tree on the linearized array |
| greedy + data structures | Greedy selection needing efficient access to next-best element | Use priority queue, set, or segment tree to maintain candidates for greedy selection |
| strings + dp | Edit distance, regex matching, distinct subsequences | String structure defines the DP states, transitions use character comparisons |
Decomposition Workflow
When facing a cross-category problem, follow this mental procedure:
- Identify the outer structure: Is it fundamentally a graph problem? An optimization? A counting problem? This determines your top-level approach.
- Identify the bottleneck operation: What's the expensive part of the naive solution? This tells you which data structure or technique you need.
- Check if a standard reduction applies: Many problems reduce to well-known ones. "Minimum vertex cover on a tree" → tree DP. "Maximum independent set on a bipartite graph" → max matching via König's theorem.
- Verify complexity: Multiply the costs of your layers. If Graph × DS gives O(N log² N), check if that fits the constraints.
→ Outer structure: tree path query
→ Core operation: k-th smallest on a set of values
→ Solution: persistent segment tree on Euler tour + LCA for path extraction.
Building Your Classification Instinct
Classification isn't something you learn from reading — it's a trained reflex built through deliberate practice. Here are concrete methods to develop it:
Method 1: Statement Reading Without Solving
Open a Codeforces problemset page. For each problem, read only the statement and constraints. Write down your classification in under 90 seconds. Then check the editorial or tags. Track your accuracy over time.
Target accuracy by rating:
- 800–1200: Should classify correctly >90% of the time
- 1300–1600: Should classify correctly >80% of the time
- 1700–2000: Should classify correctly >70% (these start blending categories)
- 2100+: Classification becomes less binary; you'll identify 2–3 plausible approaches
Method 2: Tag Analysis Drill
Go to Codeforces problemset, filter by a specific tag (e.g., "bitmasks"), and solve 10 problems. Then switch to a different tag. After covering 5–6 tags, do a mixed set and try to classify without seeing tags. The goal is to internalize the feel of each category.
Method 3: Constraint-First Reading
Train yourself to always read constraints before the problem statement. Build a mental model of what complexity is needed, then read the statement looking for confirmation. This inverts the usual approach but dramatically speeds up classification.
A Quick Classification Helper
The following C++ snippet demonstrates a simple heuristic for classifying problems by constraint size. While real classification requires understanding the problem statement, this captures the constraint-reading instinct programmatically:
#include <bits/stdc++.h>
using namespace std;
struct ClassificationHint {
string complexity;
vector<string> likely_approaches;
};
ClassificationHint classifyByConstraint(long long n) {
if (n <= 10)
return {"O(N!)", {"brute force", "permutation enumeration", "backtracking"}};
if (n <= 20)
return {"O(2^N)", {"bitmask DP", "meet-in-the-middle"}};
if (n <= 40)
return {"O(2^(N/2))", {"meet-in-the-middle"}};
if (n <= 500)
return {"O(N^3)", {"Floyd-Warshall", "interval DP", "Gaussian elimination"}};
if (n <= 5000)
return {"O(N^2)", {"quadratic DP", "pairwise comparison", "2D DP"}};
if (n <= 100000)
return {"O(N log N)", {"sorting + binary search", "segment tree", "divide and conquer"}};
if (n <= 1000000)
return {"O(N)", {"two-pointer", "linear DP", "sieve", "BFS/DFS"}};
if (n <= 10000000)
return {"O(N)", {"linear sieve", "prefix sums", "simple iteration"}};
if (n <= 1000000000LL)
return {"O(sqrt(N)) or O(log N)", {"math formula", "binary search on answer"}};
return {"O(log N)", {"matrix exponentiation", "binary lifting", "math"}};
}
// Detect multiple-constraint patterns
void analyzeConstraints(vector<pair<string, long long>>& params) {
cout << "=== Constraint Analysis ===" << endl;
for (auto& [name, val] : params) {
auto hint = classifyByConstraint(val);
cout << name << " = " << val
<< " → target " << hint.complexity << endl;
cout << " Likely: ";
for (int i = 0; i < (int)hint.likely_approaches.size(); i++) {
if (i) cout << ", ";
cout << hint.likely_approaches[i];
}
cout << endl;
}
}
int main() {
// Example: problem with N nodes, Q queries, K special nodes
vector<pair<string, long long>> constraints = {
{"N", 200000},
{"Q", 200000},
{"K", 20}
};
analyzeConstraints(constraints);
// Output:
// N = 200000 → target O(N log N)
// Likely: sorting + binary search, segment tree, divide and conquer
// Q = 200000 → target O(N log N)
// Likely: sorting + binary search, segment tree, divide and conquer
// K = 20 → target O(2^N)
// Likely: bitmask DP, meet-in-the-middle
//
// Interpretation: O(N log N) per query with bitmask DP over K states
// Total: O(Q * 2^K) or O((N+Q) * log N * 2^K)
}
Method 4: Classify by Problem Source
Different contest platforms and rounds have characteristic problem distributions:
- Codeforces Div 2 A/B: Implementation, greedy, simple math, constructive
- Codeforces Div 2 C/D: DP, graphs, binary search, number theory
- Codeforces Div 1 C+: Advanced DS, combinatorics, cross-category
- AtCoder ABC D/E: DP-heavy, clean mathematical formulations
- AtCoder ARC/AGC: Combinatorics, constructive, deep mathematical insight
- USACO Gold/Platinum: Graph theory, DP on trees, sweep-line
Common Traps & Misdirections
Problem setters are skilled at disguising problems. Here are the most common ways a problem can mislead your classification instinct:
Trap 1: The Greedy That's Actually DP
The problem looks greedily solvable — you can sort the input and process items in order. But the greedy choice fails because decisions interact. Test: try to construct a counterexample where the greedy fails. If you find one within 2 minutes, switch to DP. Classic example: 0/1 knapsack looks greedy (sort by value/weight ratio) but isn't.
Trap 2: The Graph Hidden in a Grid
Problems on 2D grids are graph problems in disguise. Each cell is a node, adjacent cells are edges. "Minimum moves to reach (N,M)" is just BFS. "Connected regions of same color" is flood-fill. Don't overthink grid problems — model them as graphs immediately.
Trap 3: The Math Problem Disguised as Simulation
The statement describes a process: "repeat this operation until …" The naive simulation is O(N²) or worse, but there's a closed-form formula. Signal: if N ≤ 10¹⁸ and the process is described step-by-step, you cannot simulate — find the math. Often involves modular arithmetic, cycles, or matrix exponentiation.
Trap 4: The DP That's Actually Greedy
The reverse of Trap 1. You set up an elaborate DP and it works, but the editorial shows a clean O(N log N) greedy. While this doesn't cause WA, it wastes time. Heuristic: if your DP transitions never actually need to "look back" at more than one previous state, a greedy might exist.
Trap 5: The Data Structure Problem That Needs Offline Processing
Queries that seem to require a persistent or advanced online data structure can often be solved more easily offline. If the problem doesn't explicitly require online processing, consider sorting queries (Mo's algorithm, sweep-line, CDQ divide & conquer) before reaching for persistent segment trees.
Trap 6: The Number Theory Red Herring
"Modulo 10⁹+7" doesn't always mean the problem is number theory. It usually just means the answer is large and you should use modular arithmetic in your DP or combinatorics. The actual technique could be anything — the mod is just output formatting.
Trap 7: The "Looks Like Brute Force" but N is Small for a Reason
When N ≤ 20 or N ≤ 15, the problem setter is practically telling you to use exponential algorithms. Don't waste time looking for a polynomial solution. Bitmask DP, backtracking with pruning, or meet-in-the-middle is almost certainly intended. Trust the constraints.
The Meta-Strategy for Uncertain Classification
When you genuinely can't classify a problem after 3–4 minutes:
- Re-read the constraints. You probably missed something. Pay attention to unusual bounds.
- Write the brute force. For small inputs, code the naive O(2N) or O(N!) solution. Run it on examples. The pattern in the output often reveals the algorithm.
- Work backwards from complexity. If N = 10⁵, you need O(N log N). What O(N log N) algorithms do you know? Merge sort, segment tree, binary search... does any of them apply?
- Move on. If classification fails after 5 minutes, skip the problem and return later. Fresh eyes often immediately spot what tired eyes miss.