Design: Unique ID Generator
The Problem
Every distributed system needs a way to uniquely identify entities — users, orders, messages, transactions, events. In a single-server world you'd just use an auto-incrementing integer. But the moment you add a second server, that approach breaks: two servers could issue the same ID at the same time.
The unique ID generator is one of the most deceptively simple system design problems. The naive solutions are easy to spot, but the optimal solution — Twitter's Snowflake ID — is a masterclass in using bit-level encoding to satisfy multiple requirements simultaneously with zero coordination between servers.
Requirements
Functional Requirements
- IDs must be globally unique — no two IDs should ever collide, even across data centers
- IDs must be 64-bit numeric — they must fit in a standard
long/int64type - IDs must be roughly sortable by time — newer IDs should be larger than older IDs
- IDs must contain only numerical values — no UUIDs or strings
Non-Functional Requirements
- High throughput: Each server must generate at least 10,000 IDs per second
- Low latency: ID generation must be a local operation — no network calls, no locks, no coordination
- High availability: The ID generator must never be a single point of failure
- Scalability: Adding more servers should not require reconfiguring existing ones
Back-of-the-Envelope Estimation
With 10,000 IDs/sec per server and, say, 100 servers:
- Total throughput: 10,000 × 100 = 1,000,000 IDs/sec
- Per day: 1M × 86,400 = 86.4 billion IDs/day
- Per year: ~31.5 trillion IDs/year
- In 69 years (Snowflake's timestamp range): ~2.17 quadrillion IDs — well within 64-bit range (263 ≈ 9.2 quintillion)
Approach 1: UUID
The most obvious answer: use a Universally Unique Identifier (UUID). A UUID is a 128-bit number, typically represented as a 36-character hex string:
550e8400-e29b-41d4-a716-446655440000
│ │ │ │ │
time_lo mid hi clk node (for v1)
random random random random random (for v4)
UUIDv4 (the most common variant) is generated from 122 random bits. The probability of collision with 1018 UUIDs is approximately 10-19 — essentially zero.
Why UUID Doesn't Meet Our Requirements
| Requirement | UUID |
|---|---|
| Globally unique | ✅ Yes (practically zero collision probability) |
| 64-bit numeric | ❌ 128 bits, not 64 |
| Sortable by time | ❌ UUIDv4 is completely random |
| Numeric only | ❌ Contains hex characters and dashes |
| No coordination | ✅ Each server generates independently |
UUIDv4 is a fine general-purpose identifier, but it fails our requirements on three counts. Also, 128 bits wastes storage and makes indexing slower — B-tree index nodes hold fewer keys, leading to more disk I/O for range scans.
Approach 2: Database Auto-Increment
The simplest distributed approach: use a single database server with an auto-incrementing primary key.
CREATE TABLE id_generator (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
stub CHAR(1) NOT NULL DEFAULT 'x',
UNIQUE KEY stub (stub)
);
-- Generate a new ID:
REPLACE INTO id_generator (stub) VALUES ('x');
SELECT LAST_INSERT_ID();
This is essentially what Flickr used in their early days (the "ticket server" approach). The REPLACE INTO trick avoids ever growing the table beyond one row while still incrementing the counter.
Problems
- Single point of failure: If the database goes down, the entire system stops generating IDs
- Performance bottleneck: Every ID generation requires a network round-trip to the database (~1-5ms). At 10K IDs/sec, that's 10K DB writes per second per server
- Not horizontally scalable: You can't simply "add more databases" — they'd generate conflicting IDs
- Availability: Even with a replica, failover takes time and risks duplicate IDs during split-brain scenarios
| Requirement | DB Auto-Increment |
|---|---|
| Globally unique | ✅ Yes (serial counter) |
| 64-bit numeric | ✅ Yes (BIGINT) |
| Sortable by time | ✅ Perfectly sortable |
| No coordination | ❌ All servers must coordinate through one DB |
| High availability | ❌ Single point of failure |
Approach 3: Multi-Master Replication
To address the single-point-of-failure problem, use N database servers, each incrementing by N instead of by 1.
-- Server 1 (of 3): generates 1, 4, 7, 10, 13, ...
SET @@auto_increment_increment = 3;
SET @@auto_increment_offset = 1;
-- Server 2 (of 3): generates 2, 5, 8, 11, 14, ...
SET @@auto_increment_increment = 3;
SET @@auto_increment_offset = 2;
-- Server 3 (of 3): generates 3, 6, 9, 12, 15, ...
SET @@auto_increment_increment = 3;
SET @@auto_increment_offset = 3;
With N=3 servers, IDs from server k are: k, k+N, k+2N, k+3N, …
Problems
- Not time-sortable across servers: Server 1 might generate ID 7 after Server 2 generates ID 8, breaking time ordering
- Hard to scale: Adding a 4th server requires changing the increment on all existing servers from 3 to 4 — a risky, coordinated operation on running systems
- IDs are not uniformly distributed: If server 2 is heavily loaded, there will be "gaps" in the sequence where server 2's IDs cluster
- Still requires network call: Each ID generation still needs a DB round-trip
Approach 4: Twitter Snowflake (The Solution)
In 2010, Twitter engineers faced this exact problem: they needed to generate hundreds of thousands of unique IDs per second across a distributed fleet, with IDs that were 64-bit, time-sortable, and generated with zero coordination. Their solution — Snowflake — became the gold standard for distributed ID generation.
The genius of Snowflake is its bit-packing strategy: it divides a 64-bit integer into four fields, each encoding a different piece of information:
┌─────┬───────────────────────────────────────────┬──────────┬──────────┬────────────────┐
│ 0 │ 41 bits: timestamp │ 5 bits │ 5 bits │ 12 bits │
│sign │ (ms since custom epoch) │datacenter│ machine │ sequence │
└─────┴───────────────────────────────────────────┴──────────┴──────────┴────────────────┘
bit 63 bit 0
Bit-by-Bit Breakdown
Bit 63: Sign Bit (1 bit)
Always 0. This ensures the ID is always a positive signed 64-bit integer. Languages like Java use signed longs, so this avoids negative-number confusion.
Bits 62–22: Timestamp (41 bits)
Milliseconds since a custom epoch. Twitter's epoch is November 4, 2010 01:42:54 UTC (Unix timestamp 1288834974657).
Maximum value: 2^41 - 1 = 2,199,023,255,551 ms
Convert to years: 2,199,023,255,551 / (1000 × 60 × 60 × 24 × 365.25) ≈ 69.73 years
Twitter's epoch: Nov 4, 2010
Expiry: ~July 2080
If you use Unix epoch (Jan 1, 1970):
Expiry: ~September 2039 ← much sooner!
Bits 21–17: Datacenter ID (5 bits)
Supports 25 = 32 data centers. This is assigned at deployment time and never changes.
Bits 16–12: Machine ID (5 bits)
Supports 25 = 32 machines per data center. Combined with datacenter ID: 32 × 32 = 1,024 unique worker instances.
Bits 11–0: Sequence Number (12 bits)
A per-machine counter that increments for each ID generated within the same millisecond. Supports 212 = 4,096 IDs per millisecond per machine.
4,096 IDs/ms × 1,000 ms/s = 4,096,000 IDs per second per machine
With 1,024 machines: 4,096,000 × 1,024 ≈ 4.19 billion IDs per second (total)
Generation Algorithm
class SnowflakeGenerator:
EPOCH = 1288834974657 # Twitter epoch: Nov 4, 2010
DATACENTER_BITS = 5
MACHINE_BITS = 5
SEQUENCE_BITS = 12
MAX_DATACENTER = (1 << 5) - 1 # 31
MAX_MACHINE = (1 << 5) - 1 # 31
MAX_SEQUENCE = (1 << 12) - 1 # 4095
def __init__(self, datacenter_id, machine_id):
self.datacenter_id = datacenter_id
self.machine_id = machine_id
self.sequence = 0
self.last_timestamp = -1
def generate(self):
timestamp = current_time_ms() - self.EPOCH
if timestamp == self.last_timestamp:
# Same millisecond: increment sequence
self.sequence = (self.sequence + 1) & self.MAX_SEQUENCE
if self.sequence == 0:
# Sequence exhausted — wait for next millisecond
timestamp = wait_next_ms(self.last_timestamp)
else:
# New millisecond: reset sequence
self.sequence = 0
if timestamp < self.last_timestamp:
raise ClockMovedBackwardsError(
f"Clock moved backwards by "
f"{self.last_timestamp - timestamp}ms"
)
self.last_timestamp = timestamp
# Bit-pack the ID
id = (timestamp << 22) | \
(self.datacenter_id << 17) | \
(self.machine_id << 12) | \
self.sequence
return id
Concrete Example
Let's trace through a real ID generation:
Given:
current_time = 1713456000000 ms (April 18, 2026 16:00:00 UTC)
custom_epoch = 1288834974657 ms (Twitter epoch)
datacenter_id = 7 (binary: 00111)
machine_id = 13 (binary: 01101)
sequence = 42 (binary: 000000101010)
Step 1: Compute timestamp offset
timestamp = 1713456000000 - 1288834974657 = 424621025343 ms
binary: 0110001011010111010001101011110011111111 (41 bits)
Step 2: Bit-pack
[0][0110001011010111010001101011110011111111][00111][01101][000000101010]
↑ ↑ ↑ ↑ ↑
sign timestamp (41) DC(5) Mach(5) Seq(12)
Step 3: As decimal
id = (424621025343 << 22)
| (7 << 17)
| (13 << 12)
| 42
= 1,781,167,050,286,526,506
Verify sortability: any ID generated in the NEXT millisecond will have
timestamp = 424621025344, making it strictly larger regardless of
datacenter, machine, or sequence values.
▶ Snowflake ID Bit Layout
Step through each segment of a 64-bit Snowflake ID. Watch how timestamp, datacenter, machine, and sequence bits are packed, then see how IDs from different machines remain time-sortable.
Clock Synchronization & NTP
Snowflake's correctness depends on monotonically increasing timestamps. But clocks in distributed systems are notoriously unreliable. This is where NTP (Network Time Protocol) comes in.
How NTP Works
NTP synchronizes machine clocks with authoritative time servers in a hierarchy (strata):
- Stratum 0: Atomic clocks, GPS receivers — the ultimate time source
- Stratum 1: Servers directly connected to Stratum 0 devices
- Stratum 2: Servers synchronized to Stratum 1 (most production servers)
- Stratum 3–15: Progressively less accurate
NTP achieves accuracy of 1-10ms in typical LAN environments and 10-100ms over the internet. For Snowflake, this is usually sufficient since the timestamp granularity is 1ms.
Clock Skew: The Danger
The nightmare scenario: NTP adjusts your clock backwards. If the current timestamp is suddenly less than last_timestamp, Snowflake would either:
- Generate duplicate IDs (if it reuses a past timestamp + sequence combination)
- Throw an error and stop (what Twitter's Snowflake actually does)
// From Twitter's Snowflake source (Scala):
if (timestamp < lastTimestamp) {
log.error("clock is moving backwards. Rejecting requests "
+ "until %d.", lastTimestamp)
throw new InvalidSystemClock(
"Clock moved backwards. Refusing to generate id for "
+ "%d milliseconds".format(lastTimestamp - timestamp))
}
Mitigating Clock Skew
- Use NTP with
-xflag: This tells ntpd to slew the clock (gradually adjust) instead of stepping (jumping) for offsets under 128ms. Slewing avoids backwards jumps - Monitor NTP offset: Alert if offset exceeds 10ms — something is wrong
- Tolerate small backwards jumps: If the jump is < 5ms, you can spin-wait. If larger, refuse and alert
- Use hardware timestamps: Google's TrueTime API uses GPS + atomic clocks to provide a bounded time interval — but this is expensive hardware
[earliest, latest]. Spanner waits out the uncertainty window before committing — guaranteeing external consistency. This is overkill for ID generation but fascinating for distributed transactions.
Instagram's Approach
Instagram needed time-sortable 64-bit IDs but wanted to use PostgreSQL rather than a separate Snowflake service. Their solution is a clever stored function that runs inside the database:
-- Instagram's ID generation (simplified):
-- 41 bits: timestamp (ms since Jan 1, 2011)
-- 13 bits: shard ID (logical shard, from 0 to 8191)
-- 10 bits: auto-incrementing sequence per shard per ms
CREATE OR REPLACE FUNCTION next_id(
shard_id INT DEFAULT 0,
OUT result BIGINT
) AS $$
DECLARE
epoch BIGINT := 1293840000000; -- Jan 1, 2011 UTC
seq_id BIGINT;
now_ms BIGINT;
BEGIN
SELECT nextval('table_id_seq') % 1024 INTO seq_id;
now_ms := (EXTRACT(EPOCH FROM clock_timestamp()) * 1000)::BIGINT - epoch;
result := (now_ms << 23)
| (shard_id << 10)
| seq_id;
END;
$$ LANGUAGE plpgsql;
Key Differences from Snowflake
| Aspect | Twitter Snowflake | Instagram ID |
|---|---|---|
| Where it runs | Standalone service (Thrift RPC) | Inside PostgreSQL (PL/pgSQL) |
| Timestamp bits | 41 | 41 |
| Location encoding | 5 DC + 5 machine = 1,024 workers | 13 shard bits = 8,192 shards |
| Sequence bits | 12 (4,096/ms/machine) | 10 (1,024/ms/shard) |
| Dependency | Separate Thrift service | PostgreSQL (already there) |
| Network call | Yes (to Snowflake service) | No (runs inside the DB) |
Instagram's approach is elegant for PostgreSQL-heavy architectures: you don't need a separate service, and the ID is generated atomically with the row insertion. The trade-off is that it's tightly coupled to PostgreSQL.
Sonyflake
Sony's Sonyflake is an alternative inspired by Snowflake, optimized for longer lifetimes at the cost of fewer machines and lower per-machine throughput:
┌───────────────────────────────────────────┬──────────────────┬────────────┐
│ 39 bits: timestamp │ 8 bits │ 16 bits │
│ (10ms units since custom epoch) │ sequence │ machine ID │
└───────────────────────────────────────────┴──────────────────┴────────────┘
Key Design Decisions
- Timestamp in 10ms units (not 1ms): 239 × 10ms = 5,497,558,138,880ms ≈ 174 years (vs. Snowflake's 69 years)
- 8-bit sequence: Only 256 IDs per 10ms window = 25,600 IDs/sec per machine
- 16-bit machine ID: Supports 216 = 65,536 machines (vs. Snowflake's 1,024)
- Machine ID from private IP: The lower 16 bits of the machine's private IPv4 address — no configuration needed
Sonyflake trades per-machine throughput for longer lifetime and more machines. It's ideal for IoT or microservice architectures with many small instances.
ULID (Universally Unique Lexicographically Sortable Identifier)
ULID is a 128-bit identifier (like UUID) that's time-sortable and string-sortable. While it doesn't meet our 64-bit requirement, it's widely used and worth understanding:
01ARZ3NDEKTSV4RRFFQ69G5FAV
│ │
10 chars: timestamp 16 chars: randomness
(48 bits, ms precision) (80 bits)
Structure: 128 bits total
┌────────────────────────────┬──────────────────────────────────────────┐
│ 48 bits: Unix timestamp │ 80 bits: randomness │
│ (ms, ~8,919 years) │ (cryptographically secure) │
└────────────────────────────┴──────────────────────────────────────────┘
ULID Advantages
- Lexicographically sortable: The Crockford Base32 encoding preserves time ordering even as a string
- Case insensitive: No lowercase letters in Base32
- No special characters: URL-safe, no dashes
- Monotonic within millisecond: If multiple ULIDs are generated in the same millisecond, the random portion is incremented, ensuring monotonicity
ULID vs UUID
| Feature | UUIDv4 | UUIDv7 | ULID |
|---|---|---|---|
| Size | 128 bits | 128 bits | 128 bits |
| Time-sortable | ❌ | ✅ | ✅ |
| String representation | 36 chars (hex + dash) | 36 chars (hex + dash) | 26 chars (Base32) |
| String-sortable | ❌ | ✅ | ✅ |
| Collision resistance | 122 random bits | 62 random bits | 80 random bits |
Comprehensive Comparison
Here is the definitive comparison of all approaches discussed:
| Approach | Bits | Time-Sortable | Coordination | Throughput/Machine | Lifetime |
|---|---|---|---|---|---|
| UUIDv4 | 128 | ❌ Random | None | Unlimited | Infinite |
| DB Auto-Inc | 64 | ✅ Perfect | Central DB | ~10K/sec (DB limited) | 292 years |
| Multi-Master | 64 | ⚠️ Per-server only | Increment config | ~10K/sec (DB limited) | 292 years / N |
| Snowflake | 64 | ✅ ms-precision | None | 4.096M/sec | ~69 years |
| 64 | ✅ ms-precision | None (inside PG) | 1,024/ms/shard | ~69 years | |
| Sonyflake | 63 | ✅ 10ms-precision | None | 25.6K/sec | ~174 years |
| ULID | 128 | ✅ ms-precision | None | Unlimited (random) | ~8,919 years |
▶ ID Generation Flow: Multiple Servers
Watch three servers in different data centers independently generate unique Snowflake IDs with no coordination. Notice how IDs remain time-sortable across all machines.
Bit Manipulation Deep Dive
Let's walk through the exact bitwise operations used to construct and deconstruct a Snowflake ID.
Constructing an ID
// Given values:
timestamp = 424621025343 // ms since epoch
datacenter_id = 7 // 0b00111
machine_id = 13 // 0b01101
sequence = 42 // 0b000000101010
// Step 1: Shift timestamp left by 22 bits
// (to make room for 5+5+12 = 22 bits of DC+machine+seq)
timestamp << 22
= 424621025343 << 22
= 424621025343 × 4194304
= 1781167050235084800
In binary:
0|0110001011010111010001101011110011111111|0000000000000000000000
41-bit timestamp 22 zero-bits (room for rest)
// Step 2: Shift datacenter left by 17 bits (room for 5+12)
datacenter_id << 17
= 7 << 17
= 917504
In binary:
00000000000000000000000000000000000000000|00111|00000000000000000
DC 17 zero-bits
// Step 3: Shift machine left by 12 bits (room for 12)
machine_id << 12
= 13 << 12
= 53248
In binary:
000000000000000000000000000000000000000000000000|01101|000000000000
Mach 12 zero-bits
// Step 4: OR everything together
id = (timestamp << 22) | (datacenter_id << 17) | (machine_id << 12) | sequence
= 1781167050235084800 | 917504 | 53248 | 42
= 1781167050236055594
Binary result:
0|0110001011010111010001101011110011111111|00111|01101|000000101010
S timestamp DC Mach Sequence
Deconstructing an ID
// Given: id = 1781167050236055594
// Extract sequence (lowest 12 bits)
sequence = id & 0xFFF // = id & 4095 = 42
// Extract machine ID (bits 12-16)
machine = (id >> 12) & 0x1F // = (id >> 12) & 31 = 13
// Extract datacenter ID (bits 17-21)
datacenter = (id >> 17) & 0x1F // = (id >> 17) & 31 = 7
// Extract timestamp (bits 22-62)
timestamp = (id >> 22) & 0x1FFFFFFFFFF // = 424621025343
// Convert back to absolute time
absolute_time = timestamp + EPOCH
= 424621025343 + 1288834974657
= 1713456000000
// = April 18, 2026 16:00:00 UTC ✓
This ability to extract metadata from the ID itself is incredibly powerful. Given any Snowflake ID, you can determine:
- When it was created (to the millisecond)
- Where it was created (datacenter + machine)
- What order it was created in (within that millisecond on that machine)
Bitmask Reference
Component Bits Mask (hex) Mask (decimal)
─────────────────────────────────────────────────────
Sequence 0-11 0x000000000FFF 4095
Machine ID 12-16 0x000000001F000 126976 (after >>12: 0x1F = 31)
Datacenter 17-21 0x00000003E0000 2523136 (after >>17: 0x1F = 31)
Timestamp 22-62 0x7FFFFFFFFFC00000 (after >>22: 0x1FFFFFFFFFF)
Sign bit 63 0x8000000000000000
Production Considerations
Machine ID Assignment
How do you assign unique machine IDs to each server? Several strategies:
- ZooKeeper / etcd: Each server registers at startup, gets a unique sequential ID. Most reliable approach
- Configuration management: Assign machine IDs in deployment configs (Kubernetes labels, environment variables)
- IP-based (Sonyflake): Use lower bits of private IP. Works without any coordination but risks collisions if IPs are recycled
- AWS Instance ID: Hash the EC2 instance ID to 10 bits. Unique within an account
Sequence Exhaustion
When 4,096 IDs are generated in the same millisecond (sequence wraps to 0), Snowflake blocks until the next millisecond. This is a spin-wait:
def wait_next_ms(last_ts):
ts = current_time_ms() - EPOCH
while ts <= last_ts:
ts = current_time_ms() - EPOCH
return ts
# Worst case: wait up to 1ms (typically microseconds)
# At 4,096 IDs/ms, this means sustained throughput
# of 4,096,000 IDs/sec per machine
Epoch Selection
Choose your custom epoch carefully:
- Set it to your system's launch date (or slightly before, for testing)
- Document it prominently — it's a critical configuration constant
- Never change it after deployment (all existing IDs would become undecodable)
- Consider using a round number (e.g., midnight Jan 1 of your launch year)
ID as a Debugging Tool
Because Snowflake IDs encode metadata, they're invaluable for debugging:
# "When was this order created?"
order_id = 1781167050236055594
created_at = extract_timestamp(order_id) + EPOCH
# → April 18, 2026 16:00:00 UTC
# "Which server processed it?"
dc = extract_datacenter(order_id) # → 7
machine = extract_machine(order_id) # → 13
# → Datacenter 7, Machine 13
# "Was it generated under high load?"
seq = extract_sequence(order_id) # → 42
# → 43rd ID in that millisecond (0-indexed). If seq > 3000,
# this machine was under heavy load.
Customizing the Bit Layout
Snowflake's 1+41+5+5+12 layout is not sacred. You can reallocate bits based on your needs:
| Scenario | Layout | Trade-off |
|---|---|---|
| More machines | 1+41+13+9 | 8,192 machines but only 512 IDs/ms/machine |
| Longer lifetime | 1+45+8+10 | ~1,114 years but only 256 machines, 1,024/ms |
| Higher throughput | 1+41+6+16 | 65,536 IDs/ms/machine but only 64 machines |
| Single datacenter | 1+41+10+12 | 1,024 machines, 4,096/ms — no DC bits needed |
The constraint is always: 1 + timestamp + location + sequence = 64 bits. Every bit you give to one field is taken from another.
When to Use What
Use Snowflake when:
- You need 64-bit, time-sortable, numeric IDs
- You have a known, bounded set of machines/data centers
- Per-machine throughput of 4M IDs/sec is sufficient
- You can tolerate NTP-level clock accuracy
Use UUIDv4 when:
- You don't need time-sorting
- 128 bits is acceptable
- You need zero configuration — every process generates independently
- You want maximum collision resistance
Use ULID when:
- You want time-sortable identifiers in string form
- You need compact, URL-safe string representations
- You want a UUID replacement that sorts lexicographically
Use Instagram's approach when:
- Your stack is already PostgreSQL-centric
- You want ID generation co-located with row insertion
- You need many shards (up to 8,192)
Use Sonyflake when:
- You have many machines (up to 65K) but lower throughput per machine
- You want a longer system lifetime (174 years)
- IP-based machine ID assignment is acceptable
Interview Walkthrough
Here's the optimal structure for answering this problem in a 35-minute system design interview:
Minutes 0–5: Clarify Requirements
- Ask: "Does the ID need to be numeric? What size?" → 64-bit numeric
- Ask: "Does it need to be time-sortable?" → Yes, roughly
- Ask: "What's the expected throughput?" → 10K IDs/sec/server
- Ask: "How many data centers / servers?" → Multiple DCs, thousands of servers
Minutes 5–10: Enumerate Approaches
- Mention UUID (too big, not sortable), DB auto-increment (SPOF), multi-master (scaling issues)
- Propose Snowflake as the solution — show you know it exists
Minutes 10–25: Deep Dive into Snowflake
- Draw the bit layout diagram. Explain each field
- Walk through the generation algorithm with concrete numbers
- Discuss clock synchronization and NTP
- Explain sequence exhaustion handling
- Discuss machine ID assignment (ZooKeeper)
Minutes 25–35: Production Considerations
- Clock skew mitigation strategies
- Customizing the bit layout for specific needs
- Mention Instagram, Sonyflake, ULID as alternatives
- ID as a debugging/introspection tool
Key Takeaways
- Bit-packing eliminates coordination. By encoding time, location, and sequence into the ID itself, each machine can generate IDs independently with zero network calls
- Time in the high-order bits ensures sortability. Since the timestamp occupies the most significant bits, IDs are naturally ordered by time regardless of which machine generated them
- Custom epochs maximize lifetime. Don't waste bits representing years before your system existed — start counting from your launch date
- The bit layout is configurable. Snowflake's 1+41+5+5+12 is a starting point. Adjust based on your number of machines, required throughput, and desired lifetime
- Clock reliability is the Achilles' heel. NTP is "good enough" for most systems, but monitor clock skew aggressively and handle backwards jumps gracefully
- IDs contain metadata. A well-designed ID is also a debugging tool — extract when, where, and in what order it was created