← All Posts
Contest Strategy & Meta· Part 8 of 12

Advanced CP Template Features — Custom Structures, Debugging Macros & Common Utilities

The difference between a 5-minute solve and a 12-minute solve often comes down to infrastructure. A well-crafted template eliminates boilerplate and prevents bugs. In this post we build a production-ready template piece by piece.

Beyond the Basic Template

Every competitive programmer starts with a minimal template — bits/stdc++.h, fast I/O, a few type aliases. That gets you through Div 2 A–B. But once problems demand modular arithmetic, range queries, or debugging, you need more.

The "Always Include" Layer

These cost nothing at compile time and prevent the most common bugs:

#include <bits/stdc++.h>
using namespace std;

using ll  = long long;
using ull = unsigned long long;
using ld  = long double;
using pii = pair<int,int>;
using pll = pair<ll,ll>;
using vi  = vector<int>;
using vll = vector<ll>;
using vvi = vector<vi>;

#define all(x)   (x).begin(), (x).end()
#define rall(x)  (x).rbegin(), (x).rend()
#define sz(x)    (int)(x).size()
#define pb       push_back
#define mp       make_pair
#define fi       first
#define se       second

const int INF  = 1e9 + 7;
const ll  LINF = 1e18 + 7;
const int MOD  = 998244353;

mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());

void solve() {
    // Problem logic here
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    int t = 1;
    cin >> t;
    while (t--) solve();
    return 0;
}

The "Paste on Demand" Layer

These are powerful but add 30–100+ lines. Keep them in separate snippets:

Rule of thumb: If you use something in >30% of contests, include it by default. If you use it <10% but it is complex, keep as a snippet.

What Bloats Your Template

Avoid including full graph algorithms, string algorithms, heavy number theory (FFT, NTT, CRT), or multiple competing implementations of the same structure. These change too much problem-to-problem or are rarely needed — keep them as separate library files.

Debug Macros & Pretty Printing

The single most impactful template addition is a debug macro system that prints variable names and values to stderr locally, but compiles to nothing on the judge.

Core Debug Infrastructure

The complete system below handles all standard containers, pairs, tuples, and nested structures automatically:

#ifdef LOCAL
#define dbg(...) cerr << "[" << #__VA_ARGS__ << "]: ", debug_out(__VA_ARGS__)
#else
#define dbg(...)
#endif

// Base case: single value
template <typename T>
void debug_print(const T& x) { cerr << x; }

// Specialization: string
void debug_print(const string& s) { cerr << '"' << s << '"'; }

// Specialization: char
void debug_print(char c) { cerr << '\'' << c << '\''; }

// Specialization: bool
void debug_print(bool b) { cerr << (b ? "true" : "false"); }

// Pair
template <typename A, typename B>
void debug_print(const pair<A,B>& p) {
    cerr << '(';
    debug_print(p.first);
    cerr << ", ";
    debug_print(p.second);
    cerr << ')';
}

// Iterable containers (vector, set, map, deque, ...)
template <typename T,
          typename = decltype(begin(declval<T>())),
          typename = enable_if_t<!is_same_v<T, string>>>
void debug_print(const T& container) {
    cerr << '{';
    bool first = true;
    for (auto& elem : container) {
        if (!first) cerr << ", ";
        first = false;
        debug_print(elem);
    }
    cerr << '}';
}

// Tuple (C++17 fold expression)
template <typename... Args>
void debug_print(const tuple<Args...>& t) {
    cerr << '(';
    apply([](const auto&... args) {
        int n = 0;
        ((debug_print(args), cerr << (++n != sizeof...(args) ? ", " : "")), ...);
    }, t);
    cerr << ')';
}

// Variadic debug_out: prints all arguments separated by ", "
void debug_out() { cerr << endl; }

template <typename Head, typename... Tail>
void debug_out(const Head& h, const Tail&... t) {
    debug_print(h);
    if constexpr (sizeof...(t) > 0) cerr << ", ";
    debug_out(t...);
}

Usage Examples

int x = 42;
vector<int> v = {1, 2, 3, 4, 5};
map<string, int> scores = {{"alice", 100}, {"bob", 95}};
pair<int,int> p = {3, 7};

dbg(x);           // [x]: 42
dbg(v);           // [v]: {1, 2, 3, 4, 5}
dbg(scores);      // [scores]: {("alice", 100), ("bob", 95)}
dbg(x, p, v);     // [x, p, v]: 42, (3, 7), {1, 2, 3, 4, 5}

Compiling Locally with Debug

# Local (debug enabled):  g++ -std=c++17 -O2 -DLOCAL -Wall -Wextra -o sol sol.cpp
# Submission (zero overhead):  g++ -std=c++17 -O2 -o sol sol.cpp
Pro tip: Use a shell alias like alias cc='g++ -std=c++17 -O2 -DLOCAL -Wall -Wextra' so you never accidentally submit with debug output enabled.

Debug Macros for 2D Arrays

#ifdef LOCAL
#define dbg2d(grid) do { \
    cerr << "[" << #grid << "] (" << (grid).size() << " rows):\n"; \
    for (int i_ = 0; i_ < (int)(grid).size(); i_++) { \
        cerr << "  " << i_ << ": "; \
        for (auto& v_ : (grid)[i_]) cerr << v_ << " "; \
        cerr << "\n"; } } while(0)
#else
#define dbg2d(grid)
#endif

Modular Arithmetic Template

Problems involving counting or combinatorics almost always require answers modulo a prime. A ModInt struct lets you write natural arithmetic without manually calling % MOD everywhere.

Production-Ready ModInt

template <int MOD>
struct ModInt {
    int val;
    ModInt() : val(0) {}
    ModInt(long long v) : val((v % MOD + MOD) % MOD) {}

    ModInt operator+(const ModInt& o) const { return ModInt(val + o.val); }
    ModInt operator-(const ModInt& o) const { return ModInt(val - o.val + MOD); }
    ModInt operator*(const ModInt& o) const { return ModInt(1LL * val * o.val); }
    ModInt operator/(const ModInt& o) const { return *this * o.inv(); }

    ModInt& operator+=(const ModInt& o) { return *this = *this + o; }
    ModInt& operator-=(const ModInt& o) { return *this = *this - o; }
    ModInt& operator*=(const ModInt& o) { return *this = *this * o; }
    ModInt& operator/=(const ModInt& o) { return *this = *this / o; }

    ModInt operator-() const { return ModInt(-val); }
    bool operator==(const ModInt& o) const { return val == o.val; }
    bool operator!=(const ModInt& o) const { return val != o.val; }

    ModInt power(long long exp) const {
        ModInt base = *this, result = 1;
        exp %= (MOD - 1);  // Fermat's little theorem
        if (exp < 0) exp += MOD - 1;
        while (exp > 0) {
            if (exp & 1) result *= base;
            base *= base;
            exp >>= 1;
        }
        return result;
    }

    ModInt inv() const {
        // MOD must be prime for Fermat inverse
        return power(MOD - 2);
    }

    friend ostream& operator<<(ostream& os, const ModInt& m) {
        return os << m.val;
    }
    friend istream& operator>>(istream& is, ModInt& m) {
        long long v; is >> v; m = ModInt(v); return is;
    }
};

using mint = ModInt<998244353>;
// using mint = ModInt<1000000007>;  // switch as needed

Combinatorics Utilities with ModInt

const int MAXN = 2000005;
mint fac[MAXN], inv_fac[MAXN];

void precompute_factorials() {
    fac[0] = 1;
    for (int i = 1; i < MAXN; i++)
        fac[i] = fac[i - 1] * i;
    inv_fac[MAXN - 1] = fac[MAXN - 1].inv();
    for (int i = MAXN - 2; i >= 0; i--)
        inv_fac[i] = inv_fac[i + 1] * (i + 1);
}

mint C(int n, int r) {
    if (r < 0 || r > n) return 0;
    return fac[n] * inv_fac[r] * inv_fac[n - r];
}

mint P(int n, int r) {
    if (r < 0 || r > n) return 0;
    return fac[n] * inv_fac[n - r];
}

mint catalan(int n) {
    return C(2 * n, n) * mint(n + 1).inv();
}
Why precompute inverse factorials backwards? inv_fac[MAXN-1] uses one modular exponentiation, then each previous value is a single multiply — O(N) total instead of O(N log MOD).

Custom Data Structures

Three data structures appear so frequently that having bug-free template versions is essential: Segment Tree, DSU, and Fenwick Tree.

Segment Tree with Lazy Propagation

template <typename T, typename F>
struct LazySegTree {
    int n;
    vector<T> tree;
    vector<F> lazy;
    T identity;
    F lazy_id;
    T combine(T a, T b) { return a + b; }
    T apply_lazy(T val, F lz, int len) { return val + lz * len; }
    F compose_lazy(F a, F b) { return a + b; }

    LazySegTree() {}
    LazySegTree(int _n, T _id, F _lid)
        : n(_n), tree(4*_n, _id), lazy(4*_n, _lid), identity(_id), lazy_id(_lid) {}

    void build(const vector<T>& a, int v, int tl, int tr) {
        if (tl == tr) { tree[v] = a[tl]; return; }
        int tm = (tl + tr) / 2;
        build(a, 2*v, tl, tm); build(a, 2*v+1, tm+1, tr);
        tree[v] = combine(tree[2*v], tree[2*v+1]);
    }
    void build(const vector<T>& a) { build(a, 1, 0, n-1); }

    void apply(int v, int tl, int tr, F lz) {
        tree[v] = apply_lazy(tree[v], lz, tr-tl+1);
        lazy[v] = compose_lazy(lazy[v], lz);
    }
    void push(int v, int tl, int tr) {
        if (lazy[v] != lazy_id) {
            int tm = (tl+tr)/2;
            apply(2*v, tl, tm, lazy[v]); apply(2*v+1, tm+1, tr, lazy[v]);
            lazy[v] = lazy_id;
        }
    }
    void update(int v, int tl, int tr, int l, int r, F val) {
        if (l > tr || r < tl) return;
        if (l <= tl && tr <= r) { apply(v, tl, tr, val); return; }
        push(v, tl, tr); int tm = (tl+tr)/2;
        update(2*v, tl, tm, l, r, val); update(2*v+1, tm+1, tr, l, r, val);
        tree[v] = combine(tree[2*v], tree[2*v+1]);
    }
    void update(int l, int r, F val) { update(1, 0, n-1, l, r, val); }

    T query(int v, int tl, int tr, int l, int r) {
        if (l > tr || r < tl) return identity;
        if (l <= tl && tr <= r) return tree[v];
        push(v, tl, tr); int tm = (tl+tr)/2;
        return combine(query(2*v, tl, tm, l, r), query(2*v+1, tm+1, tr, l, r));
    }
    T query(int l, int r) { return query(1, 0, n-1, l, r); }
};

DSU with Rollback (for Offline Divide & Conquer)

struct DSURollback {
    vector<int> parent, rank_;
    vector<pair<int,int>> history;
    int components;

    DSURollback(int n) : parent(n), rank_(n, 0), components(n) {
        iota(parent.begin(), parent.end(), 0);
    }
    int find(int x) {
        while (parent[x] != x) x = parent[x];
        return x;  // no path compression — needed for rollback
    }
    bool unite(int a, int b) {
        a = find(a); b = find(b);
        if (a == b) return false;
        if (rank_[a] < rank_[b]) swap(a, b);
        history.push_back({b, parent[b]});
        history.push_back({-1, rank_[a]});
        parent[b] = a;
        if (rank_[a] == rank_[b]) rank_[a]++;
        components--;
        return true;
    }
    int save() { return (int)history.size(); }
    void rollback(int checkpoint) {
        while ((int)history.size() > checkpoint) {
            auto [node, old_val] = history.back(); history.pop_back();
            if (node == -1) {
                auto [n2, old_parent] = history.back(); history.pop_back();
                rank_[find(n2)] = old_val;
                parent[n2] = old_parent;
                components++;
            }
        }
    }
    bool connected(int a, int b) { return find(a) == find(b); }
};

Fenwick Tree (Binary Indexed Tree)

template <typename T>
struct Fenwick {
    int n;
    vector<T> tree;
    Fenwick(int _n) : n(_n), tree(_n + 1, T(0)) {}
    void update(int i, T delta) {
        for (i++; i <= n; i += i & (-i)) tree[i] += delta;
    }
    T prefix(int i) const {
        T s = 0; for (i++; i > 0; i -= i & (-i)) s += tree[i]; return s;
    }
    T query(int l, int r) const { return prefix(r) - (l ? prefix(l-1) : T(0)); }
    int lower_bound(T target) const {
        int pos = 0; T sum = 0;
        for (int pw = 1 << __lg(n); pw > 0; pw >>= 1) {
            if (pos + pw <= n && sum + tree[pos + pw] < target)
                { pos += pw; sum += tree[pos]; }
        }
        return pos;
    }
};
Segment Tree vs. Fenwick: Use Fenwick for simple prefix-sum / point-update problems — it is 2–3× faster and uses half the memory. Use Segment Tree when you need lazy propagation, non-invertible operations (like max), or more complex range updates.

Geometry Primitives

Computational geometry problems are rare in Codeforces but appear in ICPC. A compact geometry template saves enormous time when they do appear.

Point Struct & Basic Operations

using ld = long double;
const ld EPS = 1e-9;

struct Point {
    ld x, y;
    Point() : x(0), y(0) {}
    Point(ld x, ld y) : x(x), y(y) {}
    Point operator+(const Point& o) const { return {x+o.x, y+o.y}; }
    Point operator-(const Point& o) const { return {x-o.x, y-o.y}; }
    Point operator*(ld t) const { return {x*t, y*t}; }
    ld dot(const Point& o) const { return x*o.x + y*o.y; }
    ld cross(const Point& o) const { return x*o.y - y*o.x; }
    ld norm() const { return sqrt(x*x + y*y); }
    ld norm2() const { return x*x + y*y; }
    bool operator<(const Point& o) const {
        return x < o.x - EPS || (fabs(x-o.x) < EPS && y < o.y - EPS);
    }
    friend ostream& operator<<(ostream& os, const Point& p) {
        return os << "(" << p.x << ", " << p.y << ")";
    }
    friend istream& operator>>(istream& is, Point& p) {
        return is >> p.x >> p.y;
    }
};

int orientation(Point a, Point b, Point c) {
    ld v = (b - a).cross(c - a);
    if (v > EPS) return 1;   // CCW
    if (v < -EPS) return -1; // CW
    return 0;                 // collinear
}

Convex Hull (Andrew's Monotone Chain)

vector<Point> convex_hull(vector<Point> pts) {
    int n = (int)pts.size();
    if (n < 3) return pts;
    sort(pts.begin(), pts.end());
    vector<Point> hull;
    for (int i = 0; i < n; i++) {  // lower hull
        while (hull.size() >= 2 &&
               (hull.back()-hull[hull.size()-2]).cross(pts[i]-hull[hull.size()-2]) <= EPS)
            hull.pop_back();
        hull.push_back(pts[i]);
    }
    int lo = (int)hull.size();
    for (int i = n-2; i >= 0; i--) {  // upper hull
        while ((int)hull.size() > lo &&
               (hull.back()-hull[hull.size()-2]).cross(pts[i]-hull[hull.size()-2]) <= EPS)
            hull.pop_back();
        hull.push_back(pts[i]);
    }
    hull.pop_back();
    return hull;
}
Integer geometry: For problems using integer coordinates, replace ld with long long and remove EPS comparisons. Cross products and dot products remain exact. Only use floating-point when the problem explicitly requires it.

I/O Optimization & Pragmas

I/O can be the bottleneck in problems with large input. Understanding which optimizations matter prevents both TLE and wasted effort.

Standard Fast I/O

ios::sync_with_stdio(false);
cin.tie(nullptr);
// These two lines are always the first thing in main().

sync_with_stdio(false) decouples C++ streams from C stdio. cin.tie(nullptr) prevents flushing cout before every cin read. Together they are sufficient for 95% of problems. After calling these, do not mix cin/cout with scanf/printf.

Custom Fast Reader (for Extreme Cases)

namespace FastIO {
    const int BUF = 1 << 22;
    char buf[BUF], *p1 = buf, *p2 = buf;
    inline char gc() {
        if (p1 == p2) { p2 = (p1 = buf) + fread(buf, 1, BUF, stdin); if (p1 == p2) return EOF; }
        return *p1++;
    }
    inline int readint() {
        int x = 0, f = 1; char c = gc();
        while (c < '0' || c > '9') { if (c == '-') f = -1; c = gc(); }
        while (c >= '0' && c <= '9') { x = x * 10 + c - '0'; c = gc(); }
        return x * f;
    }
}
using FastIO::readint;

Pragma Optimizations

#pragma GCC optimize("O2,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,popcnt,lzcnt")

// Place at the very top of the file, before any #include

What each pragma does:

Benchmark Comparison

Reading 107 integers on a typical judge:

MethodTime (ms)Notes
cin (no sync)~2800Default — painfully slow
scanf~750C-style, consistent
cin (with sync off)~680Standard fast I/O
fread + manual parse~210Custom reader, rarely needed
mmap + manual parse~180Platform-dependent, fragile
When do pragmas help? They can give 2–3× speedup for tight loops over large arrays (bitset ops, brute-force with SIMD). For most problems, O2 + fast I/O is sufficient.

Output Optimization

Always use '\n' instead of endl — the latter flushes the buffer every time, which can cause TLE on heavy-output problems.

// Good: '\n' is just a character, no flush
for (int i = 0; i < n; i++)
    cout << a[i] << " \n"[i == n - 1];

Organizing Your Template Library

As your collection of tested code grows, organization becomes crucial. You need to find the right snippet in under 30 seconds during a contest.

Recommended Directory Structure

Organize by category: data-structures/, math/, geometry/, graph/, string/, plus a template/ folder with your base template and debug header. Add a tests/ folder with stress-test scripts and unit tests for each template. One file per algorithm, clearly named.

Snippets vs. Full Template

Full Template File

  • Single file with everything included
  • Simple: open & start coding
  • Risk: 300+ lines of boilerplate
  • Hard to maintain and keep tested
  • Good for: ICPC (one machine)

Snippet-Based Approach

  • Minimal base + paste what you need
  • Each snippet is small and tested
  • Requires editor support (VS Code, Vim)
  • More flexible, easier to update
  • Good for: Codeforces, AtCoder

Editor Snippets

Configure your editor with snippet shortcuts — type segtree and expand to the full implementation. Faster than copy-paste from files during a contest.

Version Control for Your Library

Keep your library in a Git repository. Tag stable versions before major contests (git tag icpc-2026) so you always have a known-good snapshot. Commit with descriptive messages so you can trace when a template was last verified.

Never use untested code in a contest. Every snippet should be verified against at least 2–3 problems. Stress-test new templates (see previous post) before relying on them.

Pre-Contest Checklist

Before every contest: verify your base template compiles with -Wall -Wextra -Wshadow, debug macros work locally, editor snippets are up to date, and you can compile and run a test within 10 seconds of starting.

← Previous: Stress Testing Next: Time Management in Contests →