Advanced CP Template Features — Custom Structures, Debugging Macros & Common Utilities
The difference between a 5-minute solve and a 12-minute solve often comes down to infrastructure. A well-crafted template eliminates boilerplate and prevents bugs. In this post we build a production-ready template piece by piece.
Beyond the Basic Template
Every competitive programmer starts with a minimal template — bits/stdc++.h,
fast I/O, a few type aliases. That gets you through Div 2 A–B. But once problems demand
modular arithmetic, range queries, or debugging, you need more.
The "Always Include" Layer
These cost nothing at compile time and prevent the most common bugs:
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
using ull = unsigned long long;
using ld = long double;
using pii = pair<int,int>;
using pll = pair<ll,ll>;
using vi = vector<int>;
using vll = vector<ll>;
using vvi = vector<vi>;
#define all(x) (x).begin(), (x).end()
#define rall(x) (x).rbegin(), (x).rend()
#define sz(x) (int)(x).size()
#define pb push_back
#define mp make_pair
#define fi first
#define se second
const int INF = 1e9 + 7;
const ll LINF = 1e18 + 7;
const int MOD = 998244353;
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
void solve() {
// Problem logic here
}
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int t = 1;
cin >> t;
while (t--) solve();
return 0;
}
The "Paste on Demand" Layer
These are powerful but add 30–100+ lines. Keep them in separate snippets:
- ModInt — modular arithmetic struct
- Segment Tree / Fenwick — range query problems
- DSU — connectivity, Kruskal's MST
- Geometry primitives — computational geometry
What Bloats Your Template
Avoid including full graph algorithms, string algorithms, heavy number theory (FFT, NTT, CRT), or multiple competing implementations of the same structure. These change too much problem-to-problem or are rarely needed — keep them as separate library files.
Debug Macros & Pretty Printing
The single most impactful template addition is a debug macro system that prints variable
names and values to stderr locally, but compiles to nothing on the judge.
Core Debug Infrastructure
The complete system below handles all standard containers, pairs, tuples, and nested structures automatically:
#ifdef LOCAL
#define dbg(...) cerr << "[" << #__VA_ARGS__ << "]: ", debug_out(__VA_ARGS__)
#else
#define dbg(...)
#endif
// Base case: single value
template <typename T>
void debug_print(const T& x) { cerr << x; }
// Specialization: string
void debug_print(const string& s) { cerr << '"' << s << '"'; }
// Specialization: char
void debug_print(char c) { cerr << '\'' << c << '\''; }
// Specialization: bool
void debug_print(bool b) { cerr << (b ? "true" : "false"); }
// Pair
template <typename A, typename B>
void debug_print(const pair<A,B>& p) {
cerr << '(';
debug_print(p.first);
cerr << ", ";
debug_print(p.second);
cerr << ')';
}
// Iterable containers (vector, set, map, deque, ...)
template <typename T,
typename = decltype(begin(declval<T>())),
typename = enable_if_t<!is_same_v<T, string>>>
void debug_print(const T& container) {
cerr << '{';
bool first = true;
for (auto& elem : container) {
if (!first) cerr << ", ";
first = false;
debug_print(elem);
}
cerr << '}';
}
// Tuple (C++17 fold expression)
template <typename... Args>
void debug_print(const tuple<Args...>& t) {
cerr << '(';
apply([](const auto&... args) {
int n = 0;
((debug_print(args), cerr << (++n != sizeof...(args) ? ", " : "")), ...);
}, t);
cerr << ')';
}
// Variadic debug_out: prints all arguments separated by ", "
void debug_out() { cerr << endl; }
template <typename Head, typename... Tail>
void debug_out(const Head& h, const Tail&... t) {
debug_print(h);
if constexpr (sizeof...(t) > 0) cerr << ", ";
debug_out(t...);
}
Usage Examples
int x = 42;
vector<int> v = {1, 2, 3, 4, 5};
map<string, int> scores = {{"alice", 100}, {"bob", 95}};
pair<int,int> p = {3, 7};
dbg(x); // [x]: 42
dbg(v); // [v]: {1, 2, 3, 4, 5}
dbg(scores); // [scores]: {("alice", 100), ("bob", 95)}
dbg(x, p, v); // [x, p, v]: 42, (3, 7), {1, 2, 3, 4, 5}
Compiling Locally with Debug
# Local (debug enabled): g++ -std=c++17 -O2 -DLOCAL -Wall -Wextra -o sol sol.cpp
# Submission (zero overhead): g++ -std=c++17 -O2 -o sol sol.cpp
alias cc='g++ -std=c++17 -O2 -DLOCAL -Wall -Wextra' so you never
accidentally submit with debug output enabled.
Debug Macros for 2D Arrays
#ifdef LOCAL
#define dbg2d(grid) do { \
cerr << "[" << #grid << "] (" << (grid).size() << " rows):\n"; \
for (int i_ = 0; i_ < (int)(grid).size(); i_++) { \
cerr << " " << i_ << ": "; \
for (auto& v_ : (grid)[i_]) cerr << v_ << " "; \
cerr << "\n"; } } while(0)
#else
#define dbg2d(grid)
#endif
Modular Arithmetic Template
Problems involving counting or combinatorics almost always require answers modulo a
prime. A ModInt struct lets you write natural arithmetic without manually
calling % MOD everywhere.
Production-Ready ModInt
template <int MOD>
struct ModInt {
int val;
ModInt() : val(0) {}
ModInt(long long v) : val((v % MOD + MOD) % MOD) {}
ModInt operator+(const ModInt& o) const { return ModInt(val + o.val); }
ModInt operator-(const ModInt& o) const { return ModInt(val - o.val + MOD); }
ModInt operator*(const ModInt& o) const { return ModInt(1LL * val * o.val); }
ModInt operator/(const ModInt& o) const { return *this * o.inv(); }
ModInt& operator+=(const ModInt& o) { return *this = *this + o; }
ModInt& operator-=(const ModInt& o) { return *this = *this - o; }
ModInt& operator*=(const ModInt& o) { return *this = *this * o; }
ModInt& operator/=(const ModInt& o) { return *this = *this / o; }
ModInt operator-() const { return ModInt(-val); }
bool operator==(const ModInt& o) const { return val == o.val; }
bool operator!=(const ModInt& o) const { return val != o.val; }
ModInt power(long long exp) const {
ModInt base = *this, result = 1;
exp %= (MOD - 1); // Fermat's little theorem
if (exp < 0) exp += MOD - 1;
while (exp > 0) {
if (exp & 1) result *= base;
base *= base;
exp >>= 1;
}
return result;
}
ModInt inv() const {
// MOD must be prime for Fermat inverse
return power(MOD - 2);
}
friend ostream& operator<<(ostream& os, const ModInt& m) {
return os << m.val;
}
friend istream& operator>>(istream& is, ModInt& m) {
long long v; is >> v; m = ModInt(v); return is;
}
};
using mint = ModInt<998244353>;
// using mint = ModInt<1000000007>; // switch as needed
Combinatorics Utilities with ModInt
const int MAXN = 2000005;
mint fac[MAXN], inv_fac[MAXN];
void precompute_factorials() {
fac[0] = 1;
for (int i = 1; i < MAXN; i++)
fac[i] = fac[i - 1] * i;
inv_fac[MAXN - 1] = fac[MAXN - 1].inv();
for (int i = MAXN - 2; i >= 0; i--)
inv_fac[i] = inv_fac[i + 1] * (i + 1);
}
mint C(int n, int r) {
if (r < 0 || r > n) return 0;
return fac[n] * inv_fac[r] * inv_fac[n - r];
}
mint P(int n, int r) {
if (r < 0 || r > n) return 0;
return fac[n] * inv_fac[n - r];
}
mint catalan(int n) {
return C(2 * n, n) * mint(n + 1).inv();
}
inv_fac[MAXN-1] uses one modular exponentiation, then each previous value
is a single multiply — O(N) total instead of O(N log MOD).
Custom Data Structures
Three data structures appear so frequently that having bug-free template versions is essential: Segment Tree, DSU, and Fenwick Tree.
Segment Tree with Lazy Propagation
template <typename T, typename F>
struct LazySegTree {
int n;
vector<T> tree;
vector<F> lazy;
T identity;
F lazy_id;
T combine(T a, T b) { return a + b; }
T apply_lazy(T val, F lz, int len) { return val + lz * len; }
F compose_lazy(F a, F b) { return a + b; }
LazySegTree() {}
LazySegTree(int _n, T _id, F _lid)
: n(_n), tree(4*_n, _id), lazy(4*_n, _lid), identity(_id), lazy_id(_lid) {}
void build(const vector<T>& a, int v, int tl, int tr) {
if (tl == tr) { tree[v] = a[tl]; return; }
int tm = (tl + tr) / 2;
build(a, 2*v, tl, tm); build(a, 2*v+1, tm+1, tr);
tree[v] = combine(tree[2*v], tree[2*v+1]);
}
void build(const vector<T>& a) { build(a, 1, 0, n-1); }
void apply(int v, int tl, int tr, F lz) {
tree[v] = apply_lazy(tree[v], lz, tr-tl+1);
lazy[v] = compose_lazy(lazy[v], lz);
}
void push(int v, int tl, int tr) {
if (lazy[v] != lazy_id) {
int tm = (tl+tr)/2;
apply(2*v, tl, tm, lazy[v]); apply(2*v+1, tm+1, tr, lazy[v]);
lazy[v] = lazy_id;
}
}
void update(int v, int tl, int tr, int l, int r, F val) {
if (l > tr || r < tl) return;
if (l <= tl && tr <= r) { apply(v, tl, tr, val); return; }
push(v, tl, tr); int tm = (tl+tr)/2;
update(2*v, tl, tm, l, r, val); update(2*v+1, tm+1, tr, l, r, val);
tree[v] = combine(tree[2*v], tree[2*v+1]);
}
void update(int l, int r, F val) { update(1, 0, n-1, l, r, val); }
T query(int v, int tl, int tr, int l, int r) {
if (l > tr || r < tl) return identity;
if (l <= tl && tr <= r) return tree[v];
push(v, tl, tr); int tm = (tl+tr)/2;
return combine(query(2*v, tl, tm, l, r), query(2*v+1, tm+1, tr, l, r));
}
T query(int l, int r) { return query(1, 0, n-1, l, r); }
};
DSU with Rollback (for Offline Divide & Conquer)
struct DSURollback {
vector<int> parent, rank_;
vector<pair<int,int>> history;
int components;
DSURollback(int n) : parent(n), rank_(n, 0), components(n) {
iota(parent.begin(), parent.end(), 0);
}
int find(int x) {
while (parent[x] != x) x = parent[x];
return x; // no path compression — needed for rollback
}
bool unite(int a, int b) {
a = find(a); b = find(b);
if (a == b) return false;
if (rank_[a] < rank_[b]) swap(a, b);
history.push_back({b, parent[b]});
history.push_back({-1, rank_[a]});
parent[b] = a;
if (rank_[a] == rank_[b]) rank_[a]++;
components--;
return true;
}
int save() { return (int)history.size(); }
void rollback(int checkpoint) {
while ((int)history.size() > checkpoint) {
auto [node, old_val] = history.back(); history.pop_back();
if (node == -1) {
auto [n2, old_parent] = history.back(); history.pop_back();
rank_[find(n2)] = old_val;
parent[n2] = old_parent;
components++;
}
}
}
bool connected(int a, int b) { return find(a) == find(b); }
};
Fenwick Tree (Binary Indexed Tree)
template <typename T>
struct Fenwick {
int n;
vector<T> tree;
Fenwick(int _n) : n(_n), tree(_n + 1, T(0)) {}
void update(int i, T delta) {
for (i++; i <= n; i += i & (-i)) tree[i] += delta;
}
T prefix(int i) const {
T s = 0; for (i++; i > 0; i -= i & (-i)) s += tree[i]; return s;
}
T query(int l, int r) const { return prefix(r) - (l ? prefix(l-1) : T(0)); }
int lower_bound(T target) const {
int pos = 0; T sum = 0;
for (int pw = 1 << __lg(n); pw > 0; pw >>= 1) {
if (pos + pw <= n && sum + tree[pos + pw] < target)
{ pos += pw; sum += tree[pos]; }
}
return pos;
}
};
Geometry Primitives
Computational geometry problems are rare in Codeforces but appear in ICPC. A compact geometry template saves enormous time when they do appear.
Point Struct & Basic Operations
using ld = long double;
const ld EPS = 1e-9;
struct Point {
ld x, y;
Point() : x(0), y(0) {}
Point(ld x, ld y) : x(x), y(y) {}
Point operator+(const Point& o) const { return {x+o.x, y+o.y}; }
Point operator-(const Point& o) const { return {x-o.x, y-o.y}; }
Point operator*(ld t) const { return {x*t, y*t}; }
ld dot(const Point& o) const { return x*o.x + y*o.y; }
ld cross(const Point& o) const { return x*o.y - y*o.x; }
ld norm() const { return sqrt(x*x + y*y); }
ld norm2() const { return x*x + y*y; }
bool operator<(const Point& o) const {
return x < o.x - EPS || (fabs(x-o.x) < EPS && y < o.y - EPS);
}
friend ostream& operator<<(ostream& os, const Point& p) {
return os << "(" << p.x << ", " << p.y << ")";
}
friend istream& operator>>(istream& is, Point& p) {
return is >> p.x >> p.y;
}
};
int orientation(Point a, Point b, Point c) {
ld v = (b - a).cross(c - a);
if (v > EPS) return 1; // CCW
if (v < -EPS) return -1; // CW
return 0; // collinear
}
Convex Hull (Andrew's Monotone Chain)
vector<Point> convex_hull(vector<Point> pts) {
int n = (int)pts.size();
if (n < 3) return pts;
sort(pts.begin(), pts.end());
vector<Point> hull;
for (int i = 0; i < n; i++) { // lower hull
while (hull.size() >= 2 &&
(hull.back()-hull[hull.size()-2]).cross(pts[i]-hull[hull.size()-2]) <= EPS)
hull.pop_back();
hull.push_back(pts[i]);
}
int lo = (int)hull.size();
for (int i = n-2; i >= 0; i--) { // upper hull
while ((int)hull.size() > lo &&
(hull.back()-hull[hull.size()-2]).cross(pts[i]-hull[hull.size()-2]) <= EPS)
hull.pop_back();
hull.push_back(pts[i]);
}
hull.pop_back();
return hull;
}
ld with long long and remove EPS comparisons. Cross products
and dot products remain exact. Only use floating-point when the problem explicitly
requires it.
I/O Optimization & Pragmas
I/O can be the bottleneck in problems with large input. Understanding which optimizations matter prevents both TLE and wasted effort.
Standard Fast I/O
ios::sync_with_stdio(false);
cin.tie(nullptr);
// These two lines are always the first thing in main().
sync_with_stdio(false) decouples C++ streams from C stdio.
cin.tie(nullptr) prevents flushing cout before every
cin read. Together they are sufficient for 95% of problems. After
calling these, do not mix cin/cout with
scanf/printf.
Custom Fast Reader (for Extreme Cases)
namespace FastIO {
const int BUF = 1 << 22;
char buf[BUF], *p1 = buf, *p2 = buf;
inline char gc() {
if (p1 == p2) { p2 = (p1 = buf) + fread(buf, 1, BUF, stdin); if (p1 == p2) return EOF; }
return *p1++;
}
inline int readint() {
int x = 0, f = 1; char c = gc();
while (c < '0' || c > '9') { if (c == '-') f = -1; c = gc(); }
while (c >= '0' && c <= '9') { x = x * 10 + c - '0'; c = gc(); }
return x * f;
}
}
using FastIO::readint;
Pragma Optimizations
#pragma GCC optimize("O2,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,popcnt,lzcnt")
// Place at the very top of the file, before any #include
What each pragma does:
O2/unroll-loops— standard optimization plus loop unrolling.avx2— 256-bit SIMD vectorization for tight array loops.popcnt, lzcnt— hardware bit-manipulation instructions.
Benchmark Comparison
Reading 107 integers on a typical judge:
| Method | Time (ms) | Notes |
|---|---|---|
cin (no sync) | ~2800 | Default — painfully slow |
scanf | ~750 | C-style, consistent |
cin (with sync off) | ~680 | Standard fast I/O |
fread + manual parse | ~210 | Custom reader, rarely needed |
mmap + manual parse | ~180 | Platform-dependent, fragile |
O2 +
fast I/O is sufficient.
Output Optimization
Always use '\n' instead of endl — the latter flushes the
buffer every time, which can cause TLE on heavy-output problems.
// Good: '\n' is just a character, no flush
for (int i = 0; i < n; i++)
cout << a[i] << " \n"[i == n - 1];
Organizing Your Template Library
As your collection of tested code grows, organization becomes crucial. You need to find the right snippet in under 30 seconds during a contest.
Recommended Directory Structure
Organize by category: data-structures/, math/,
geometry/, graph/, string/, plus a
template/ folder with your base template and debug header. Add a
tests/ folder with stress-test scripts and unit tests for each
template. One file per algorithm, clearly named.
Snippets vs. Full Template
Full Template File
- Single file with everything included
- Simple: open & start coding
- Risk: 300+ lines of boilerplate
- Hard to maintain and keep tested
- Good for: ICPC (one machine)
Snippet-Based Approach
- Minimal base + paste what you need
- Each snippet is small and tested
- Requires editor support (VS Code, Vim)
- More flexible, easier to update
- Good for: Codeforces, AtCoder
Editor Snippets
Configure your editor with snippet shortcuts — type segtree and expand
to the full implementation. Faster than copy-paste from files during a contest.
Version Control for Your Library
Keep your library in a Git repository. Tag stable versions before major contests
(git tag icpc-2026) so you always have a known-good snapshot. Commit with
descriptive messages so you can trace when a template was last verified.
Pre-Contest Checklist
Before every contest: verify your base template compiles with
-Wall -Wextra -Wshadow, debug macros work locally, editor snippets
are up to date, and you can compile and run a test within 10 seconds of starting.