Problem Classification — Pattern Recognition in Competitive Programming

June 2026 · 19 min read

Why Classification Matters

In a 5-hour contest with 8–12 problems, the first 60 seconds you spend on each problem are the most valuable. That minute determines whether you spend the next 30 minutes heading in the right direction or chasing a dead end. Classification — the ability to rapidly identify what type of problem you're facing — is the single most important meta-skill in competitive programming.

Consider two contestants reading the same problem:

Contestant A reads the statement, thinks "this looks hard," and starts experimenting with brute-force ideas, hoping something clicks.
Contestant B reads the first paragraph, notes that N ≤ 20, sees the phrase "subset of items," and immediately thinks bitmask DP. Within 90 seconds, she's coding a solution.

Both have the same algorithmic knowledge. The difference is purely pattern recognition — the ability to map a problem statement to a known technique before you start thinking about the algorithm in detail.

The Classification Pipeline: Read statement → identify constraints → spot keywords → match to category → select algorithm → code. Steps 2–4 should take under 2 minutes for a practiced competitor.

This skill compounds: the faster you classify a problem, the more time you have for the genuinely hard problems that resist classification. Virtually every problem on Codeforces Div 2 A–D can be classified within two minutes by someone who has internalized the patterns. The editorial-difficulty problems (E, F) are where creative thinking begins — but even those usually decompose into classifiable sub-problems.

The Major Problem Categories

Every competitive programming problem falls into one or more of the following families. For each, I list the key signals that should fire when you read the statement:

1. Greedy

greedy Problems where making the locally optimal choice at each step leads to a globally optimal solution.

Signals: "maximize/minimize X subject to simple constraints," sortable inputs, exchange argument applies, optimal substructure is obvious without overlapping sub-problems.
Common patterns: Activity selection, fractional knapsack, Huffman coding, scheduling by deadline, sweep-line greedy.
Warning: If you can't prove the greedy choice property in your head within 60 seconds, it's probably DP.

2. Dynamic Programming

dp Optimization or counting over overlapping sub-problems with optimal substructure.

Signals: "count the number of ways," "minimum cost to …," small state space (N ≤ 5000 for O(N²), or N ≤ 20 for bitmask), DAG structure, "choose a subsequence."
Sub-types: Knapsack, LIS/LCS, interval DP, digit DP, bitmask DP, tree DP, profile DP, DP on broken profile, SOS DP.

3. Graphs

graphs Anything with pairwise relationships, reachability, or flow.

Signals: "cities and roads," "connected components," "shortest path," "network," "dependencies," bipartite structure, matching.
Sub-types: BFS/DFS, Dijkstra, Bellman-Ford, Floyd-Warshall, MST (Kruskal/Prim), topological sort, SCC (Tarjan/Kosaraju), bipartite matching, max-flow/min-cut, 2-SAT.

4. Data Structures

data structures Problems where the algorithm is simple but requires efficient querying or updating.

Signals: "Q queries," "update and query," "range [l, r]," large N with online queries, "kth element."
Sub-types: Segment trees (with lazy propagation), BIT/Fenwick, sparse table, merge sort tree, treap/splay, DSU, HLD, centroid decomposition, LCT.

5. Math & Number Theory

math Problems requiring mathematical insight or number-theoretic algorithms.

Signals: "modulo 10⁹+7," "GCD/LCM," "prime," "divisors," "modular inverse," large numbers (up to 10¹⁸).
Sub-types: Sieve of Eratosthenes, extended Euclidean, CRT, Euler's totient, modular exponentiation, Miller-Rabin, Pollard's rho, FFT/NTT for polynomial multiplication.

6. Strings

strings Pattern matching, substring properties, and text processing.

Signals: Input is a string, "substring," "palindrome," "lexicographically smallest," "occurrences."
Sub-types: KMP, Z-function, suffix array + LCP, suffix automaton, Aho-Corasick, Manacher, hashing.

7. Combinatorics

combinatorics Counting arrangements, selections, and configurations.

Signals: "number of ways," "count distinct," "mod 10⁹+7," "arrangements," "permutations."
Sub-types: Binomial coefficients, inclusion-exclusion, Burnside/Pólya, Catalan numbers, Stirling numbers, generating functions, PIE.

8. Geometry

geometry Coordinate geometry, convex hull, intersections.

Signals: Coordinates (x, y), "distance," "area," "convex hull," "closest pair," "line intersection."
Sub-types: Convex hull (Graham/Andrew), line sweep, rotating calipers, half-plane intersection, Voronoi, Minkowski sum.

9. Game Theory

game theory Two-player games with perfect information.

Signals: "Alice and Bob take turns," "optimal play," "who wins," "first/second player."
Sub-types: Sprague-Grundy, Nim variants, minimax, game on graph.

10. Constructive Algorithms

constructive Build an object satisfying given constraints.

Signals: "construct," "find any," "print a valid," "if impossible print -1," "rearrange."
Key skill: These often require ad-hoc insight. Look for invariants: parity, degree sequences, prefix sums.

Reading the Constraints

Before you even understand what a problem is asking, look at the constraints. The allowed input size is the strongest signal for the intended time complexity, and therefore the algorithm family.

Constraint	Target Complexity	Likely Approach
N ≤ 10	O(N!) or O(2^N · N)	Brute-force permutations, backtracking
N ≤ 20	O(2^N) or O(2^N · N)	Bitmask DP, meet-in-the-middle
N ≤ 40	O(2^N/2)	Meet-in-the-middle (split into halves)
N ≤ 500	O(N³)	Floyd-Warshall, interval DP, matrix operations
N ≤ 5,000	O(N²)	Quadratic DP, pairwise comparisons
N ≤ 10⁵	O(N log N)	Sorting, segment tree, binary search, divide & conquer
N ≤ 10⁶	O(N) or O(N log N) tight	Linear algorithms, two-pointer, sieve, BFS/DFS
N ≤ 10⁷	O(N)	Linear sieve, prefix sums, simple iteration
N ≤ 10⁹	O(√N) or O(log N)	Math formula, binary search on answer, sqrt decomposition on value
N ≤ 10¹⁸	O(log N) or O(√N) with tricks	Matrix exponentiation, binary lifting, math formula, Pollard's rho

Rule of thumb: A modern judge does about 3–5 × 10⁸ simple operations per second under a 2-second time limit. So if N = 10⁵, an O(N²) = 10¹⁰ solution will TLE. An O(N log N) ≈ 1.7 × 10⁶ solution is comfortable.

Constraints also encode special structure:

N ≤ 26 or values in [a-z] → bitmask over alphabet.
Values ≤ 10⁶ with N large → frequency array, sieve, or harmonic series tricks (O(V log V)).
"Tree with N nodes" → DFS/Euler tour, HLD, centroid decomposition.
Q ≤ 10⁵ queries on array of 10⁵ → segment tree or BIT with O(log N) per query.
Q ≤ 10⁵ queries, offline allowed → Mo's algorithm, offline + sweep, CDQ divide & conquer.

Multiple Constraint Variables

When a problem has multiple parameters (N, M, K, Q), the intended complexity usually involves all of them. For instance:

N ≤ 10⁵, Q ≤ 10⁵ → O((N + Q) log N) per segment tree
N ≤ 10³, M ≤ 10³ → O(NM) grid DP or BFS
N ≤ 10⁵, K ≤ 20 → O(NK) or O(N · 2^K) depending on whether K acts as states or bitmask
N ≤ 2 × 10⁵, sum of N over all test cases ≤ 2 × 10⁵ → O(N) per test case is fine

Keyword Signals in Problem Statements

Problem setters unconsciously (or intentionally) leave linguistic fingerprints that hint at the solution type. Here's a comprehensive mapping from keywords to algorithms:

Optimization Keywords

"minimum cost" → shortest path, min-cost max-flow, DP
"maximum profit" → DP, greedy, max-flow
"minimum number of operations" → BFS on state space, greedy
"minimize the maximum" → binary search on answer
"maximize the minimum" → binary search on answer

Counting Keywords

"number of ways" → DP, combinatorics
"count pairs / triples" → two-pointer, sorting, prefix sums, FFT
"modulo 10⁹+7" → combinatorics or DP (result is large)
"distinct subsequences" → DP with deduplication
"count subarrays with property" → two-pointer, contribution technique

Graph Keywords

"cities and roads" → graph (often shortest path or MST)
"reachable" → BFS/DFS, transitive closure
"connected components" → DSU, BFS/DFS
"dependencies" → topological sort, DAG DP
"bipartite" / "two groups" → 2-coloring, matching
"network flow" / "capacity" → max-flow, min-cut

Construction Keywords

"construct" / "find any" → constructive, often greedy-based
"rearrange" → sorting, greedy construction
"if impossible, print -1" → look for necessary conditions first
"lexicographically smallest" → greedy front-to-back
"for all permutations" → constructive or math

String Keywords

"palindrome" → Manacher, DP, hashing
"substring matching" → KMP, Z-function, suffix array
"lexicographic order" → suffix array, greedy
"period" / "border" → KMP failure function
"common substring" → suffix array + LCP, DP

Game Theory Keywords

"Alice and Bob" → game theory, minimax
"optimal play" → Sprague-Grundy, DP
"take turns removing" → Nim, Sprague-Grundy
"first player wins" → parity, Grundy values
"stones in piles" → Nim (XOR of pile sizes)

The "Binary Search on Answer" Signal

One of the most powerful classification shortcuts: if a problem asks you to minimize the maximum or maximize the minimum of something, there's an 80%+ chance the intended approach is binary search on the answer value, combined with a greedy check function.

// Classic pattern: "minimize the maximum distance between K facilities"
// Binary search on the answer, check with greedy
bool canAchieve(int maxDist, vector<int>& positions, int K) {
    int count = 1, last = positions[0];
    for (int i = 1; i < (int)positions.size(); i++) {
        if (positions[i] - last >= maxDist) {
            count++;
            last = positions[i];
        }
    }
    return count >= K;
}

int solve(vector<int>& positions, int K) {
    sort(positions.begin(), positions.end());
    int lo = 1, hi = positions.back() - positions.front(), ans = hi;
    while (lo <= hi) {
        int mid = lo + (hi - lo) / 2;
        if (canAchieve(mid, positions, K)) {
            ans = mid;
            lo = mid + 1;  // try larger minimum distance
        } else {
            hi = mid - 1;
        }
    }
    return ans;
}

Cross-Category Problems

The hardest problems in competitive programming rarely belong to a single category. Division 1 C/D problems and above typically combine two or three techniques. The key is decomposition: break the problem into layers, classify each layer independently.

Common Combinations

Combination	Example Pattern	Decomposition Strategy
graph + dp	Shortest path with state (e.g., "at most K edges")	Model as graph where nodes encode (vertex, DP state), run Dijkstra/BFS on product graph
dp + data structures	DP transitions need range-min/max queries	Write the DP recurrence first, then optimize transitions with segment tree or monotonic queue
math + dp	Counting with modular arithmetic over complex states	Derive the combinatorial formula, then compute it via DP (e.g., inclusion-exclusion as DP)
graph + data structures	Tree queries with updates	Flatten tree with Euler tour, apply segment tree on the linearized array
greedy + data structures	Greedy selection needing efficient access to next-best element	Use priority queue, set, or segment tree to maintain candidates for greedy selection
strings + dp	Edit distance, regex matching, distinct subsequences	String structure defines the DP states, transitions use character comparisons

Decomposition Workflow

When facing a cross-category problem, follow this mental procedure:

Identify the outer structure: Is it fundamentally a graph problem? An optimization? A counting problem? This determines your top-level approach.
Identify the bottleneck operation: What's the expensive part of the naive solution? This tells you which data structure or technique you need.
Check if a standard reduction applies: Many problems reduce to well-known ones. "Minimum vertex cover on a tree" → tree DP. "Maximum independent set on a bipartite graph" → max matching via König's theorem.
Verify complexity: Multiply the costs of your layers. If Graph × DS gives O(N log² N), check if that fits the constraints.

Example decomposition: "Given a tree of N nodes with values, answer Q queries: for each query (u, v, k), find the k-th smallest value on the path from u to v."
→ Outer structure: tree path query
→ Core operation: k-th smallest on a set of values
→ Solution: persistent segment tree on Euler tour + LCA for path extraction.

Building Your Classification Instinct

Classification isn't something you learn from reading — it's a trained reflex built through deliberate practice. Here are concrete methods to develop it:

Method 1: Statement Reading Without Solving

Open a Codeforces problemset page. For each problem, read only the statement and constraints. Write down your classification in under 90 seconds. Then check the editorial or tags. Track your accuracy over time.

Target accuracy by rating:

800–1200: Should classify correctly >90% of the time
1300–1600: Should classify correctly >80% of the time
1700–2000: Should classify correctly >70% (these start blending categories)
2100+: Classification becomes less binary; you'll identify 2–3 plausible approaches

Method 2: Tag Analysis Drill

Go to Codeforces problemset, filter by a specific tag (e.g., "bitmasks"), and solve 10 problems. Then switch to a different tag. After covering 5–6 tags, do a mixed set and try to classify without seeing tags. The goal is to internalize the feel of each category.

Method 3: Constraint-First Reading

Train yourself to always read constraints before the problem statement. Build a mental model of what complexity is needed, then read the statement looking for confirmation. This inverts the usual approach but dramatically speeds up classification.

A Quick Classification Helper

The following C++ snippet demonstrates a simple heuristic for classifying problems by constraint size. While real classification requires understanding the problem statement, this captures the constraint-reading instinct programmatically:

#include <bits/stdc++.h>
using namespace std;

struct ClassificationHint {
    string complexity;
    vector<string> likely_approaches;
};

ClassificationHint classifyByConstraint(long long n) {
    if (n <= 10)
        return {"O(N!)", {"brute force", "permutation enumeration", "backtracking"}};
    if (n <= 20)
        return {"O(2^N)", {"bitmask DP", "meet-in-the-middle"}};
    if (n <= 40)
        return {"O(2^(N/2))", {"meet-in-the-middle"}};
    if (n <= 500)
        return {"O(N^3)", {"Floyd-Warshall", "interval DP", "Gaussian elimination"}};
    if (n <= 5000)
        return {"O(N^2)", {"quadratic DP", "pairwise comparison", "2D DP"}};
    if (n <= 100000)
        return {"O(N log N)", {"sorting + binary search", "segment tree", "divide and conquer"}};
    if (n <= 1000000)
        return {"O(N)", {"two-pointer", "linear DP", "sieve", "BFS/DFS"}};
    if (n <= 10000000)
        return {"O(N)", {"linear sieve", "prefix sums", "simple iteration"}};
    if (n <= 1000000000LL)
        return {"O(sqrt(N)) or O(log N)", {"math formula", "binary search on answer"}};
    return {"O(log N)", {"matrix exponentiation", "binary lifting", "math"}};
}

// Detect multiple-constraint patterns
void analyzeConstraints(vector<pair<string, long long>>& params) {
    cout << "=== Constraint Analysis ===" << endl;
    for (auto& [name, val] : params) {
        auto hint = classifyByConstraint(val);
        cout << name << " = " << val
             << " → target " << hint.complexity << endl;
        cout << "  Likely: ";
        for (int i = 0; i < (int)hint.likely_approaches.size(); i++) {
            if (i) cout << ", ";
            cout << hint.likely_approaches[i];
        }
        cout << endl;
    }
}

int main() {
    // Example: problem with N nodes, Q queries, K special nodes
    vector<pair<string, long long>> constraints = {
        {"N", 200000},
        {"Q", 200000},
        {"K", 20}
    };

    analyzeConstraints(constraints);

    // Output:
    // N = 200000 → target O(N log N)
    //   Likely: sorting + binary search, segment tree, divide and conquer
    // Q = 200000 → target O(N log N)
    //   Likely: sorting + binary search, segment tree, divide and conquer
    // K = 20 → target O(2^N)
    //   Likely: bitmask DP, meet-in-the-middle
    //
    // Interpretation: O(N log N) per query with bitmask DP over K states
    //                 Total: O(Q * 2^K) or O((N+Q) * log N * 2^K)
}

Method 4: Classify by Problem Source

Different contest platforms and rounds have characteristic problem distributions:

Codeforces Div 2 A/B: Implementation, greedy, simple math, constructive
Codeforces Div 2 C/D: DP, graphs, binary search, number theory
Codeforces Div 1 C+: Advanced DS, combinatorics, cross-category
AtCoder ABC D/E: DP-heavy, clean mathematical formulations
AtCoder ARC/AGC: Combinatorics, constructive, deep mathematical insight
USACO Gold/Platinum: Graph theory, DP on trees, sweep-line

Common Traps & Misdirections

Problem setters are skilled at disguising problems. Here are the most common ways a problem can mislead your classification instinct:

Trap 1: The Greedy That's Actually DP

The problem looks greedily solvable — you can sort the input and process items in order. But the greedy choice fails because decisions interact. Test: try to construct a counterexample where the greedy fails. If you find one within 2 minutes, switch to DP. Classic example: 0/1 knapsack looks greedy (sort by value/weight ratio) but isn't.

Trap 2: The Graph Hidden in a Grid

Problems on 2D grids are graph problems in disguise. Each cell is a node, adjacent cells are edges. "Minimum moves to reach (N,M)" is just BFS. "Connected regions of same color" is flood-fill. Don't overthink grid problems — model them as graphs immediately.

Trap 3: The Math Problem Disguised as Simulation

The statement describes a process: "repeat this operation until …" The naive simulation is O(N²) or worse, but there's a closed-form formula. Signal: if N ≤ 10¹⁸ and the process is described step-by-step, you cannot simulate — find the math. Often involves modular arithmetic, cycles, or matrix exponentiation.

Trap 4: The DP That's Actually Greedy

The reverse of Trap 1. You set up an elaborate DP and it works, but the editorial shows a clean O(N log N) greedy. While this doesn't cause WA, it wastes time. Heuristic: if your DP transitions never actually need to "look back" at more than one previous state, a greedy might exist.

Trap 5: The Data Structure Problem That Needs Offline Processing

Queries that seem to require a persistent or advanced online data structure can often be solved more easily offline. If the problem doesn't explicitly require online processing, consider sorting queries (Mo's algorithm, sweep-line, CDQ divide & conquer) before reaching for persistent segment trees.

Trap 6: The Number Theory Red Herring

"Modulo 10⁹+7" doesn't always mean the problem is number theory. It usually just means the answer is large and you should use modular arithmetic in your DP or combinatorics. The actual technique could be anything — the mod is just output formatting.

Trap 7: The "Looks Like Brute Force" but N is Small for a Reason

When N ≤ 20 or N ≤ 15, the problem setter is practically telling you to use exponential algorithms. Don't waste time looking for a polynomial solution. Bitmask DP, backtracking with pruning, or meet-in-the-middle is almost certainly intended. Trust the constraints.

The Meta-Strategy for Uncertain Classification

When you genuinely can't classify a problem after 3–4 minutes:

Re-read the constraints. You probably missed something. Pay attention to unusual bounds.
Write the brute force. For small inputs, code the naive O(2^N) or O(N!) solution. Run it on examples. The pattern in the output often reveals the algorithm.
Work backwards from complexity. If N = 10⁵, you need O(N log N). What O(N log N) algorithms do you know? Merge sort, segment tree, binary search... does any of them apply?
Move on. If classification fails after 5 minutes, skip the problem and return later. Fresh eyes often immediately spot what tired eyes miss.

Final principle: Classification is a probabilistic skill, not a deterministic one. You're building a Bayesian prior: "Given these signals, the problem is 70% likely to be DP, 20% likely to be greedy, 10% something else." The more problems you solve, the better your priors become. There's no shortcut — only thousands of problems read, classified, and solved.