High Level Design Series · Architecture Patterns · Part 5· Post 41 of 70

Serverless Architecture

April 2026 · 16 min read

What Is Serverless?

"Serverless" doesn't mean no servers — it means you don't manage them. The cloud provider handles provisioning, scaling, patching, and capacity planning. You write code; they run it. Two broad categories fall under the serverless umbrella:

Category	What It Means	Examples
FaaS (Function as a Service)	Deploy individual functions that execute in response to events. No long-running process.	AWS Lambda, Google Cloud Functions, Azure Functions, Cloudflare Workers
BaaS (Backend as a Service)	Fully managed backend components — auth, database, file storage, push notifications — exposed via APIs.	Firebase (Firestore, Auth, Storage), Auth0, Supabase, AWS Amplify

Modern serverless applications usually combine both: FaaS for custom business logic and BaaS for commodity services like authentication and storage.

FaaS — Function as a Service

Major Providers at a Glance

Feature	AWS Lambda	Google Cloud Functions	Azure Functions
Max timeout	15 min	60 min (2nd gen)	Unlimited (Premium plan)
Max memory	10,240 MB	32,768 MB	14,336 MB
Max package size	50 MB zipped / 250 MB unzipped (10 GB with container images)	100 MB source / container images	No hard limit (Consumption: ~1.5 GB)
Languages	Python, Node.js, Java, Go, .NET, Ruby, Rust (custom runtime)	Node.js, Python, Go, Java, .NET, Ruby, PHP	C#, JavaScript, Python, Java, PowerShell, TypeScript
Concurrency	1,000 default (can raise to 10K+)	Up to 1,000 per function (2nd gen)	200 per instance (Premium)
Free tier	1M requests + 400K GB-s/mo	2M invocations + 400K GB-s/mo	1M requests + 400K GB-s/mo

AWS Lambda Configuration Deep Dive

A real-world Lambda function definition using the Serverless Framework (serverless.yml):

service: image-processor

provider:
  name: aws
  runtime: python3.12
  region: us-east-1
  memorySize: 1024          # MB — also determines CPU allocation
  timeout: 30               # seconds (max 900)
  architecture: arm64       # Graviton2 — 20% cheaper, often faster
  environment:
    BUCKET_NAME: ${self:custom.bucketName}
    TABLE_NAME: ${self:custom.tableName}
  iam:
    role:
      statements:
        - Effect: Allow
          Action:
            - s3:GetObject
            - s3:PutObject
          Resource: arn:aws:s3:::${self:custom.bucketName}/*
        - Effect: Allow
          Action:
            - dynamodb:PutItem
            - dynamodb:GetItem
          Resource: arn:aws:dynamodb:us-east-1:*:table/${self:custom.tableName}

functions:
  processImage:
    handler: handler.process_image
    memorySize: 2048         # override provider default
    timeout: 60
    reservedConcurrency: 100 # max concurrent executions
    provisionedConcurrency: 5 # keep 5 warm instances
    events:
      - s3:
          bucket: ${self:custom.bucketName}
          event: s3:ObjectCreated:*
          rules:
            - prefix: uploads/
            - suffix: .jpg
    layers:
      - arn:aws:lambda:us-east-1:770693421928:layer:Klayers-p312-Pillow:1

  getImage:
    handler: handler.get_image
    memorySize: 256
    timeout: 10
    events:
      - httpApi:
          path: /images/{id}
          method: GET

custom:
  bucketName: my-image-bucket-${sls:stage}
  tableName: image-metadata-${sls:stage}

The corresponding Python handler:

import json
import boto3
import os
from PIL import Image
from io import BytesIO
from datetime import datetime

s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['TABLE_NAME'])

# This code runs once per container lifecycle (init phase)
print("Cold start: initializing clients and dependencies")

def process_image(event, context):
    """Triggered when a .jpg is uploaded to uploads/ prefix."""
    record = event['Records'][0]
    bucket = record['s3']['bucket']['name']
    key = record['s3']['object']['key']
    size = record['s3']['object']['size']

    # Download original image
    response = s3.get_object(Bucket=bucket, Key=key)
    img = Image.open(BytesIO(response['Body'].read()))

    # Generate thumbnail (320x320 max)
    img.thumbnail((320, 320), Image.LANCZOS)
    buffer = BytesIO()
    img.save(buffer, 'JPEG', quality=85)
    buffer.seek(0)

    # Upload thumbnail
    thumb_key = key.replace('uploads/', 'thumbnails/')
    s3.put_object(
        Bucket=bucket, Key=thumb_key,
        Body=buffer.getvalue(),
        ContentType='image/jpeg'
    )

    # Store metadata
    table.put_item(Item={
        'image_id': key.split('/')[-1].split('.')[0],
        'original_key': key,
        'thumbnail_key': thumb_key,
        'original_size': size,
        'width': img.width,
        'height': img.height,
        'processed_at': datetime.utcnow().isoformat(),
        'remaining_ms': context.get_remaining_time_in_millis()
    })

    return {
        'statusCode': 200,
        'body': json.dumps({
            'message': 'Thumbnail created',
            'thumbnail': thumb_key
        })
    }

def get_image(event, context):
    """GET /images/{id} — return metadata from DynamoDB."""
    image_id = event['pathParameters']['id']
    result = table.get_item(Key={'image_id': image_id})
    if 'Item' not in result:
        return {'statusCode': 404, 'body': '{"error":"not found"}'}
    return {
        'statusCode': 200,
        'body': json.dumps(result['Item'], default=str)
    }

Memory = CPU allocation in Lambda. Lambda ties CPU to memory linearly. At 1,769 MB you get 1 full vCPU. At 10,240 MB you get 6 vCPUs. If your function is CPU-bound (image processing, compression, ML inference), increasing memory also increases CPU — and often reduces total cost because the function finishes faster.

BaaS — Backend as a Service

BaaS eliminates entire backend components by providing them as managed APIs:

Service	What It Replaces	Key Features
Firebase Firestore	Database + real-time sync	NoSQL document DB, real-time listeners, offline persistence, security rules
Firebase Auth	Auth server + session mgmt	Email/password, OAuth (Google, GitHub, Apple), phone auth, anonymous auth
Auth0	Enterprise identity platform	SSO, MFA, RBAC, SAML/OIDC, machine-to-machine tokens, passwordless
Supabase	Postgres + REST API + auth	Open-source Firebase alternative, row-level security, real-time subscriptions
AWS Amplify	Full backend	GraphQL API (AppSync), auth (Cognito), storage (S3), hosting, CI/CD

A typical BaaS pattern: a React or mobile app talks directly to Firebase for auth and real-time data. When custom logic is needed (e.g., payment processing, image resize), a Cloud Function handles it. No Express server, no database management, no infrastructure to maintain.

Execution Model

Understanding the serverless execution lifecycle is critical for performance tuning:

The Request Lifecycle

Event Trigger (API Gateway, S3, SQS, etc.)
        │
        ▼
┌─────────────────────────────────────────────────┐
│  Is a warm container available?                 │
│     YES → Skip to "Invoke Handler"              │
│     NO  → COLD START                            │
│           1. Provision execution environment     │  ~100-300ms
│           2. Download deployment package         │  ~50-200ms
│           3. Start runtime (JVM, Node, Python)   │  ~50-500ms
│           4. Run initialization code (imports,   │  ~varies
│              SDK clients, DB connections)         │
└─────────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────────┐
│  Invoke Handler                                 │
│  - Receive event + context                      │
│  - Execute business logic                       │
│  - Return response                              │
│  Duration: billed per 1ms (min 1ms)             │
└─────────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────────┐
│  Container kept warm (~5-15 minutes)            │
│  - Reused for subsequent invocations            │
│  - Init code NOT re-run                         │
│  - Handler variables persist in memory          │
│  - /tmp directory (10 GB) persists              │
│  No requests → container destroyed              │
└─────────────────────────────────────────────────┘

▶ Serverless Execution Flow

Step through the lifecycle: cold start → execution → warm reuse → container teardown. Watch the timing comparison.

Key Execution Details

Stateless by design: Each invocation is independent. Store state in DynamoDB, S3, or ElastiCache — never rely on in-memory data persisting between invocations.
/tmp is ephemeral: Up to 10 GB of scratch space per container, but it's destroyed when the container is recycled. Use it for temporary file processing, not for caching across hours.
Concurrency model: Each concurrent request gets its own container. 100 simultaneous requests = 100 containers. This is fundamentally different from a Node.js server handling 100 requests on one event loop.
Init code runs once per container: Place SDK client initialization, database connections, and module imports outside the handler to reuse them across warm invocations.

# GOOD — initialized once per container lifecycle
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('my-table')

def handler(event, context):
    # reuses the existing DynamoDB connection
    return table.get_item(Key={'id': event['id']})

# BAD — creates a new client on every invocation
def handler_bad(event, context):
    dynamodb = boto3.resource('dynamodb')    # 50-100ms overhead per call!
    table = dynamodb.Table('my-table')
    return table.get_item(Key={'id': event['id']})

Cold Starts — The Serverless Tax

Cold starts are the most discussed limitation of serverless. They occur when a new execution environment must be created to serve a request. Here are real-world benchmarks:

Cold Start Benchmarks by Runtime

Runtime	Cold Start (p50)	Cold Start (p99)	Warm Invocation	Notes
Python 3.12	~180 ms	~400 ms	~2-5 ms	Great for most workloads
Node.js 20	~170 ms	~350 ms	~2-4 ms	V8 snapshots help
Go 1.x	~80 ms	~180 ms	~1-2 ms	Single static binary — fastest cold starts
Rust (custom runtime)	~12 ms	~30 ms	<1 ms	Minimal runtime overhead
Java 21 (no SnapStart)	~3,000 ms	~6,000 ms	~3-8 ms	JVM startup is brutal
Java 21 (SnapStart)	~200 ms	~500 ms	~3-8 ms	CRaC-based snapshot — 10-15× improvement
.NET 8 (AOT)	~250 ms	~500 ms	~2-5 ms	Native AOT avoids CLR startup

What increases cold start time?

Deployment package size: 50 MB zip → ~150ms extra. Use layers judiciously and strip unnecessary files.
VPC attachment: Used to add 8–10 seconds (ENI creation). Now ~1 second with Hyperplane ENIs, but still significant.
Init code complexity: Heavy imports (pandas, numpy, boto3) or DB connection pooling can add 200-500ms.
Memory allocation: More memory = more CPU = slightly faster init. The 1,769 MB sweet spot (1 vCPU) is common.

Mitigating Cold Starts

1. Provisioned Concurrency

Pre-warms a specified number of execution environments. They're always ready — zero cold starts for those instances.

# AWS CLI — set provisioned concurrency
aws lambda put-provisioned-concurrency-config \
  --function-name my-api-handler \
  --qualifier prod \           # alias or version (not $LATEST)
  --provisioned-concurrent-executions 50

# Cost: ~$0.015/GB-hour for provisioned concurrency
# 50 instances × 512MB × 24h × 30d = 50 × 0.5 × 720 = 18,000 GB-hours
# Monthly cost: 18,000 × $0.015 = $270/month
# Plus $0.035 per 100ms of actual execution (discounted from normal rate)

2. Keep-Warm (Ping) Strategy

# CloudWatch scheduled event — invoke every 5 minutes
# serverless.yml
functions:
  apiHandler:
    handler: handler.main
    events:
      - httpApi: 'GET /api/{proxy+}'
      - schedule:
          rate: rate(5 minutes)
          input:
            source: 'serverless-warmup'

# In handler:
def main(event, context):
    if event.get('source') == 'serverless-warmup':
        return {'statusCode': 200, 'body': 'warm'}
    # ... actual logic

Limitation: A keep-warm ping only keeps one container warm. If you need 10 concurrent warm instances, you need to fire 10 concurrent pings — which is fragile. Provisioned concurrency is the robust solution.

3. SnapStart (Java)

# Enable SnapStart for Java Lambda functions
aws lambda update-function-configuration \
  --function-name my-java-function \
  --snap-start ApplyOn=PublishedVersions

# Takes a CRaC snapshot after init, restores from it on cold start
# Reduces Java cold start from ~3-6 seconds to ~200-500ms

4. Minimize Package Size

# Python — exclude unnecessary files
package:
  individually: true
  patterns:
    - '!node_modules/**'
    - '!tests/**'
    - '!.git/**'
    - '!**/*.pyc'
    - '!**/__pycache__/**'

# Use Lambda Layers for large dependencies
# Pillow layer: ~20MB instead of bundling in each function
# boto3 is pre-installed — don't include it in your package!

Event Sources

Serverless functions are event-driven. Understanding the invocation models is crucial:

Source	Invocation Type	Retry Behavior	Use Case
API Gateway	Synchronous	No retries (caller retries)	REST/HTTP APIs, WebSockets
S3 Events	Asynchronous	2 retries, then DLQ	File upload processing, ETL
SQS	Polling (event source mapping)	Visibility timeout, DLQ after N fails	Work queues, decoupled processing
DynamoDB Streams	Polling (event source mapping)	Retries until expiry (24h), blocks shard	Change data capture, materialized views
Kinesis	Polling (event source mapping)	Retries until data expires (7d default)	Real-time streaming, analytics
SNS	Asynchronous	3 retries (immediate, 1s, 2s)	Fan-out, notifications
EventBridge	Asynchronous	Configurable retries + DLQ	Event bus, cross-service events
CloudWatch Events/Cron	Asynchronous	2 retries	Scheduled tasks, cron jobs

Invocation Model Details

# Synchronous — caller waits for response
response = lambda_client.invoke(
    FunctionName='my-function',
    InvocationType='RequestResponse',  # synchronous
    Payload=json.dumps({'key': 'value'})
)
result = json.loads(response['Payload'].read())

# Asynchronous — fire and forget, Lambda handles retries
lambda_client.invoke(
    FunctionName='my-function',
    InvocationType='Event',            # async — returns 202 immediately
    Payload=json.dumps({'key': 'value'})
)

# Event Source Mapping (polling) — Lambda polls SQS/Kinesis/DynamoDB
# and invokes your function with batches of records
aws lambda create-event-source-mapping \
  --function-name process-orders \
  --event-source-arn arn:aws:sqs:us-east-1:123:order-queue \
  --batch-size 10 \
  --maximum-batching-window-in-seconds 5 \
  --function-response-types ReportBatchItemFailures

▶ Event-Driven Serverless Pipeline

Follow an image upload through S3 → Lambda → DynamoDB → SNS notification.

Step Functions & Orchestration

Individual Lambda functions are great for simple tasks, but real workflows involve sequences, branches, retries, and parallel execution. AWS Step Functions provides a state machine abstraction for orchestrating serverless workflows.

State Machine Definition (ASL — Amazon States Language)

{
  "Comment": "Image processing pipeline",
  "StartAt": "ValidateImage",
  "States": {
    "ValidateImage": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:validate-image",
      "Next": "CheckFormat",
      "Retry": [
        {
          "ErrorEquals": ["States.TaskFailed"],
          "IntervalSeconds": 2,
          "MaxAttempts": 3,
          "BackoffRate": 2.0
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["ValidationError"],
          "Next": "RejectImage"
        }
      ]
    },
    "CheckFormat": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.format",
          "StringEquals": "RAW",
          "Next": "ConvertToJPEG"
        }
      ],
      "Default": "ProcessInParallel"
    },
    "ConvertToJPEG": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:convert-to-jpeg",
      "Next": "ProcessInParallel"
    },
    "ProcessInParallel": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "GenerateThumbnail",
          "States": {
            "GenerateThumbnail": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:123:function:thumbnail",
              "End": true
            }
          }
        },
        {
          "StartAt": "ExtractMetadata",
          "States": {
            "ExtractMetadata": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:123:function:metadata",
              "End": true
            }
          }
        },
        {
          "StartAt": "RunModeration",
          "States": {
            "RunModeration": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:123:function:moderate",
              "End": true
            }
          }
        }
      ],
      "Next": "StoreResults"
    },
    "StoreResults": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:store-results",
      "Next": "NotifyUser"
    },
    "NotifyUser": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:us-east-1:123:image-notifications",
        "Message.$": "$.message"
      },
      "End": true
    },
    "RejectImage": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:reject-image",
      "End": true
    }
  }
}

Step Functions: Standard vs Express

Feature	Standard	Express
Max duration	1 year	5 minutes
Execution guarantee	Exactly-once	At-least-once
Execution history	Full audit trail, visual debugger	CloudWatch Logs only
Pricing	$0.025 per 1,000 state transitions	Based on executions + duration
Best for	Long-running workflows, human approval steps	High-volume, short-lived event processing

Cost Model

Serverless pricing is granular and can be surprisingly cheap at low-to-moderate scale — or shockingly expensive at high throughput.

AWS Lambda Pricing Breakdown

Component	Price	Free Tier
Requests	$0.20 per 1M requests	1M requests/month
Duration (x86)	$0.0000166667 per GB-second	400,000 GB-seconds/month
Duration (ARM/Graviton)	$0.0000133334 per GB-second (20% cheaper)	400,000 GB-seconds/month
Provisioned Concurrency	$0.0000041667 per GB-second (provisioned) + $0.0000097222 per GB-second (execution)	None

Cost Calculation Examples

Scenario 1: Light API (Startup)

Requests:  500,000/month
Memory:    256 MB
Avg time:  100ms
Arch:      ARM (Graviton)

Request cost:  0 (under 1M free tier)
Duration:      500,000 × 0.1s × 0.25 GB = 12,500 GB-seconds
               12,500 - 400,000 (free tier) = 0 (under free tier!)

Total: $0/month  ← Genuinely free for light workloads

Scenario 2: Moderate API (Growing Product)

Requests:  10,000,000/month  (10M)
Memory:    512 MB
Avg time:  200ms
Arch:      ARM

Request cost:  (10M - 1M) × $0.20/1M = $1.80
Duration:      10M × 0.2s × 0.5 GB = 1,000,000 GB-s
               (1,000,000 - 400,000) × $0.0000133334 = $8.00

Total: ~$9.80/month  ← Still incredibly cheap

Scenario 3: High Traffic (At Scale)

Requests:  100,000,000/month  (100M)
Memory:    1,024 MB
Avg time:  300ms
Arch:      x86

Request cost:  (100M - 1M) × $0.20/1M = $19.80
Duration:      100M × 0.3s × 1.0 GB = 30,000,000 GB-s
               (30M - 400K) × $0.0000166667 = $493.49
API Gateway:   100M × $1.00/1M = $100.00  ← DON'T FORGET THIS!

Total: ~$613/month  ← vs ~$150/month for 2× c6g.xlarge EC2
       (EC2 wins at steady high-throughput)

Scenario 4: Where Serverless Gets Expensive

Requests:  1,000,000,000/month  (1B)
Memory:    2,048 MB
Avg time:  500ms
Arch:      x86

Request cost:  (1B - 1M) × $0.20/1M = $199.80
Duration:      1B × 0.5s × 2.0 GB = 1,000,000,000 GB-s
               (1B - 400K) × $0.0000166667 = $16,666.37
API Gateway:   1B × $1.00/1M = $1,000.00

Total: ~$17,866/month  ← At this scale, use ECS/EKS/EC2
       Equivalent EC2: ~$2,000-3,000/month

The Serverless Cost Crossover Point: Serverless is cheapest when you have variable, spiky, or low traffic. Once you pass ~10-50M requests/month with consistent load, containerized solutions (ECS Fargate, EKS) become cheaper. At 100M+ steady requests, reserved EC2 instances win decisively. The decision isn't just about cost though — factor in engineering time: no patching, no scaling configuration, no capacity planning.

Limitations & Challenges

Hard Limits

Limit	Value	Impact
Max execution time	15 minutes (Lambda)	No long-running processes, batch jobs need chunking
Max concurrent executions	1,000 default (account-level)	Shared across ALL functions — can starve other functions
Payload size (sync)	6 MB request/response	Large files must go through S3
Payload size (async)	256 KB	Pass S3 references, not data
/tmp storage	10 GB (configurable)	Ephemeral, shared across warm invocations
Environment variables	4 KB total	Use SSM Parameter Store or Secrets Manager for large configs

Operational Challenges

Debugging complexity: No SSH into a running function. Distributed tracing (X-Ray, Datadog) becomes essential. Reproducing issues locally requires tools like SAM Local or LocalStack.
Vendor lock-in: Event source mappings, IAM policies, Lambda layers, and Step Functions are deeply AWS-specific. Migrating to GCP or Azure means rewriting significant infrastructure code. The Serverless Framework and Terraform mitigate this somewhat, but the abstractions leak.
Testing difficulties: Unit testing the handler is easy, but integration testing with real event sources is hard. Local emulation (SAM, Serverless Offline) only approximates the real environment.
Observability gaps: Traditional APM tools struggle with ephemeral containers. You need Lambda-aware tools: AWS X-Ray for tracing, CloudWatch Lambda Insights for metrics, structured JSON logging with correlation IDs.
Statelessness: Every invocation starts fresh (ignoring warm container reuse). Workflows requiring state need external storage (DynamoDB, Redis, Step Functions).
Timeout cliff: If a function approaches its timeout, there's no graceful shutdown. Use context.get_remaining_time_in_millis() to checkpoint before timeout.

# Defensive timeout handling
def handler(event, context):
    items = get_batch_items()
    results = []

    for item in items:
        # Check if we have enough time remaining (leave 5s buffer)
        remaining_ms = context.get_remaining_time_in_millis()
        if remaining_ms < 5000:
            # Save progress and re-enqueue remaining items
            save_checkpoint(results)
            requeue_remaining(items[len(results):])
            return {
                'statusCode': 202,
                'body': json.dumps({
                    'processed': len(results),
                    'remaining': len(items) - len(results),
                    'status': 'partial — re-queued'
                })
            }

        results.append(process_item(item))

    return {'statusCode': 200, 'body': json.dumps(results)}

When to Use Serverless

✓ Ideal Use Cases

Use Case	Why Serverless Excels	Example
Event processing	Natural fit for event-driven model, auto-scales with event volume	S3 upload → resize image → store metadata
Webhooks	Sporadic traffic, pay nothing when idle	GitHub/Stripe/Twilio webhook handlers
Scheduled tasks	Replaces cron servers — no instance running 24/7 for a 5-minute job	Nightly reports, data cleanup, health checks
APIs with variable traffic	Scales from 0 to thousands of concurrent requests, back to 0	Startup MVP, internal tools, seasonal apps
Data transformation	Parallel processing of streaming data	Kinesis → Lambda → Elasticsearch ingestion
Chatbots & IoT	Bursty, unpredictable traffic patterns	Alexa skills, IoT rule actions
Prototyping & MVPs	Zero infrastructure cost until you have users, rapid iteration	API + DynamoDB + S3 — full stack in serverless.yml

✗ When NOT to Use Serverless

Anti-Pattern	Why It Fails	Better Alternative
Long-running processes	15-min max execution time. Video transcoding, ML training, and large batch jobs time out.	ECS Fargate tasks, AWS Batch, EC2
Latency-sensitive (<10ms)	Cold starts add 100ms–6s of latency. Even provisioned concurrency adds overhead vs bare metal.	EC2, EKS with pod pre-scaling
High-throughput steady workloads	At 100M+ requests/month with consistent load, per-invocation billing is 5-10× more expensive than reserved capacity.	ECS/EKS with auto-scaling, reserved EC2
WebSocket/persistent connections	Stateless execution model doesn't support long-lived connections natively. API Gateway WebSocket exists but is awkward.	ECS with Socket.io, dedicated WebSocket servers
Complex stateful workflows	Forcing state management through DynamoDB + Step Functions adds complexity that a simple server avoids.	Temporal/Cadence on ECS, traditional servers
Heavy local computation	Max 10 GB RAM, 6 vCPUs. Large-scale data processing, ML inference on large models, and GPU workloads are out.	EC2 with GPUs, SageMaker, EMR

Decision Framework

Should I use Serverless?

1. Is execution time < 15 minutes?
   NO  → Use containers (ECS/EKS) or EC2
   YES ↓

2. Is traffic variable/spiky/unpredictable?
   YES → Strong serverless candidate ✓
   NO  ↓

3. Do you need sub-10ms latency consistently?
   YES → Use containers or bare metal
   NO  ↓

4. Is monthly request volume < 50M?
   YES → Serverless is almost certainly cheaper ✓
   NO  ↓

5. Is engineering time more valuable than compute cost?
   YES → Serverless (less ops overhead) ✓
   NO  → Containers with reserved pricing

6. Are you locked into AWS already?
   YES → Lambda is a natural extension ✓
   NO  → Consider portability (containers are more portable)

Real-World Serverless Patterns

Pattern 1: API Gateway + Lambda + DynamoDB (REST API)

Client → API Gateway → Lambda → DynamoDB
                  ↕                  ↕
             Auth (Cognito)     DAX (cache)

# Characteristics:
# - Scales to millions of requests
# - Costs $0 at zero traffic
# - Sub-second cold starts with Node.js/Python
# - DynamoDB provides single-digit ms latency

Pattern 2: Fan-Out Processing

S3 Upload → Lambda (dispatcher) → SNS Topic
                                      ├→ Lambda: generate thumbnail
                                      ├→ Lambda: extract EXIF metadata
                                      ├→ Lambda: run content moderation
                                      └→ Lambda: update search index

# Each downstream Lambda runs in parallel
# Total processing time = max(individual times)
# Not sum(individual times)

Pattern 3: Event Sourcing with DynamoDB Streams

App → DynamoDB (writes) → DynamoDB Stream → Lambda
                                                 ├→ Update Elasticsearch
                                                 ├→ Invalidate cache
                                                 ├→ Send notification
                                                 └→ Replicate to analytics DB

# DynamoDB Streams guarantee ordering per partition key
# Lambda processes in batches (configurable 1-10,000 records)
# Exactly-once processing with idempotency keys

Pattern 4: CQRS with Serverless

# Write path (commands)
API Gateway → Lambda → DynamoDB (write model)
                            ↓ (Stream)
                       Lambda → Elasticsearch (read model)

# Read path (queries)
API Gateway → Lambda → Elasticsearch
                   or
API Gateway → Lambda → DynamoDB (if simple key-value lookups)

# Different scaling, different data models for reads vs writes

Serverless vs Containers vs VMs

Dimension	Lambda (Serverless)	ECS Fargate (Containers)	EC2 (VMs)
Scaling speed	Milliseconds (per request)	30-90 seconds	2-5 minutes
Scale to zero	Yes — $0 at idle	Yes (with scale-to-zero config)	No — minimum 1 instance
Max execution	15 min	Unlimited	Unlimited
Ops burden	Near zero	Low (still need task definitions, networking)	High (patching, AMIs, capacity)
Cost at low traffic	Cheapest (free tier covers most)	Moderate	Most expensive (always on)
Cost at high traffic	Most expensive	Moderate	Cheapest (reserved instances)
Portability	Lowest (vendor-specific)	High (Docker is portable)	Highest (any cloud or on-prem)

Key Takeaways

Serverless = FaaS + BaaS. Functions for custom logic, managed services for everything else. The goal is to write only business logic and let the cloud handle the rest.
Cold starts are real but manageable. Choose lightweight runtimes (Node.js, Python, Go, Rust), minimize package size, use provisioned concurrency for latency-sensitive paths, and SnapStart for Java.
Cost is non-linear. Serverless is nearly free at low scale and becomes expensive at high steady throughput. The crossover point is typically 10-50M requests/month — model your costs before committing.
Event-driven is the natural model. Serverless shines when functions react to events (uploads, queue messages, database changes). If you're fighting to make it fit a synchronous, stateful, long-running workload, you're using the wrong tool.
Vendor lock-in is the biggest hidden cost. Lambda, Step Functions, EventBridge, and DynamoDB are deeply intertwined. Use infrastructure-as-code (Terraform, CDK) and keep business logic portable in separate modules.
Orchestrate, don't chain. Use Step Functions for multi-step workflows instead of Lambda-calling-Lambda. Step Functions handle retries, timeouts, branching, and parallel execution with built-in visibility.

In the next post, we explore Data Pipelines — how to build reliable, scalable systems for moving and transforming data at scale, often using serverless components as building blocks.