Object Storage (S3 Architecture)
Object vs File vs Block Storage
Before diving into S3, let’s understand the three fundamental storage paradigms that underpin every modern infrastructure. Each trades off between access patterns, performance, and scalability in fundamentally different ways.
| Dimension | Block Storage | File Storage | Object Storage |
|---|---|---|---|
| Unit | Fixed-size blocks (512B–4KB) | Files in directories (hierarchical) | Objects in flat namespace (bucket + key) |
| Access | Raw block I/O (iSCSI, FC) | POSIX file APIs (NFS, SMB) | HTTP REST API (PUT/GET/DELETE) |
| Metadata | Minimal (block address only) | File system metadata (permissions, timestamps) | Rich, custom key-value metadata per object |
| Mutability | In-place updates (random writes) | In-place updates (seek + write) | Immutable — replace entire object |
| Scalability | Limited to single volume (~16 TB) | Limited by NAS controller (~PBs with clustering) | Virtually unlimited (exabytes+) |
| Performance | Lowest latency (<1ms SSD) | Low latency (1–10ms NFS) | Higher latency (50–200ms first byte) |
| Durability | Depends on RAID/replication | Depends on NAS redundancy | Designed for 99.999999999% (11 nines) |
| Cost (per GB/mo) | $0.08–0.10 (EBS gp3) | $0.03–0.30 (EFS) | $0.023 (S3 Standard) |
| Best For | Databases, boot volumes, VMs | Shared file systems, home directories, CMS | Backups, media, data lakes, archives |
| Examples | AWS EBS, Azure Disk, GCP PD | AWS EFS, Azure Files, GCP Filestore | AWS S3, Azure Blob, GCP Cloud Storage, MinIO |
Why Object Storage Dominates at Scale
The flat namespace is the secret weapon. File systems maintain a hierarchical directory tree — every mkdir, rename, or ls must traverse and lock parts of this tree. As the tree grows to billions of files, metadata operations become the bottleneck (ask anyone who’s run ls on a directory with 10 million files).
Object storage eliminates this by treating each object as an independent entity addressed by a flat key. The “directory structure” you see in the S3 console (photos/2024/vacation/img001.jpg) is a visual illusion — the forward slashes are just characters in a flat string key. This means:
- No directory locking — billions of concurrent writes to different keys with zero contention
- No rename tax — “moving” is just a copy + delete (no metadata tree walk)
- Hash-based distribution — keys are hashed to distribute objects evenly across storage nodes
- Limitless fanout — no directory entry limits, no inode exhaustion
S3 Architecture Deep Dive
Amazon S3 (Simple Storage Service), launched in 2006, stores over 350 trillion objects and handles millions of requests per second. Let’s dissect its architecture layer by layer.
Core Concepts: Buckets, Objects, Keys
Buckets are the top-level containers. Each bucket name is globally unique across all of AWS (because bucket names become part of the DNS hostname). You get up to 100 buckets per account (soft limit, can be raised to 1000).
# Bucket naming rules:
# - 3–63 characters, lowercase letters, numbers, hyphens
# - Globally unique across ALL AWS accounts
# - Cannot look like an IP address (e.g. 192.168.1.1)
# S3 URL formats:
# Path-style: https://s3.amazonaws.com/my-bucket/photos/cat.jpg
# Virtual-host: https://my-bucket.s3.amazonaws.com/photos/cat.jpg (preferred)
# Region: https://my-bucket.s3.us-west-2.amazonaws.com/photos/cat.jpg
Objects are the actual data entities. Each object consists of:
- Key — the unique identifier within a bucket (up to 1,024 bytes UTF-8)
- Value — the data itself (0 bytes to 5 TB per object)
- Version ID — automatically assigned when versioning is enabled
- Metadata — system metadata (Content-Type, Last-Modified) + user-defined metadata (up to 2 KB)
- Subresources — ACLs, torrent info, etc.
# PUT an object with custom metadata
aws s3api put-object \
--bucket my-data-lake \
--key "raw/events/2024/03/15/events-001.parquet" \
--body events-001.parquet \
--content-type "application/octet-stream" \
--metadata '{"source":"kafka-cluster-1","partition":"7","offset":"142857"}' \
--storage-class STANDARD \
--server-side-encryption "aws:kms" \
--ssekms-key-id "arn:aws:kms:us-east-1:123456:key/abc-123"
# Response:
# {
# "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
# "VersionId": "3sL4kqtJlcpXroDTDmJ+rmSpXd3dIbrHY+MTRCxf3vjVBH40Nr8X8gdRQBpUMLUo",
# "ServerSideEncryption": "aws:kms"
# }
Internal Architecture
While AWS doesn’t publish all internal details, we know from published papers and conference talks that S3 is built on several internal subsystems:
REST API, authentication, request routing, rate limiting
Object key → storage location mapping (distributed key-value store)
Decides which storage nodes & disks receive data chunks
Physical disks organized into storage nodes, erasure-coded chunks
Request flow for a PUT:
- Front-End authenticates the request (SigV4), validates the bucket & key, checks IAM policies and bucket policies.
- Index Layer reserves a new entry in the metadata store, mapping the key to a set of storage locations.
- Placement Layer selects target storage nodes based on fault domains (spread across racks, power zones, AZs).
- Storage Layer receives the data, splits it into chunks, applies erasure coding, and writes coded fragments to disks.
- Once a quorum of fragments is durably written, the index is committed and the client receives a
200 OK.
Consistency Model
S3’s consistency story is one of the most important evolutions in cloud storage history.
The Old Model: Eventual Consistency (2006–2020)
For its first 14 years, S3 had a mixed consistency model:
- Read-after-write consistency for new PUTs — if you created a brand-new object, subsequent GETs would return the latest data.
- Eventual consistency for overwrites (PUT of existing key) and DELETEs — you might read stale data for a short window.
# The classic gotcha (pre-Dec 2020):
PUT s3://bucket/config.json (version 2) # overwrite existing object
GET s3://bucket/config.json # might return version 1!
# ... seconds later ...
GET s3://bucket/config.json # now returns version 2
# Even worse — the "negative cache" bug:
GET s3://bucket/new-file.txt → 404 # object doesn't exist yet
PUT s3://bucket/new-file.txt → 200 # create it
GET s3://bucket/new-file.txt → 404! # cached 404 haunts you
This caused real production bugs: data pipelines reading stale files, CI/CD systems failing intermittently, config deployments appearing to silently fail.
The New Model: Strong Read-After-Write (Dec 2020)
In December 2020, AWS announced that S3 now delivers strong read-after-write consistency for all operations — PUTs, DELETEs, and LIST — at no additional cost and with no performance penalty.
# Post Dec 2020 — guaranteed behavior:
PUT s3://bucket/config.json → 200 # overwrite
GET s3://bucket/config.json → returns new version (guaranteed)
DELETE s3://bucket/old.txt → 204
GET s3://bucket/old.txt → 404 (guaranteed, no stale reads)
PUT s3://bucket/new.txt → 200
LIST s3://bucket/ → includes new.txt (guaranteed)
Erasure Coding & 11-Nines Durability
S3 promises 99.999999999% (11 nines) annual durability. This means if you store 10 million objects, you can statistically expect to lose a single object once every 10,000 years. How?
Erasure Coding Fundamentals
Instead of simple replication (3 copies = 3× storage overhead), S3 uses erasure coding — a mathematical technique from coding theory that achieves the same or better durability with far less storage overhead.
The core idea: take k data chunks and produce m parity chunks, for a total of n = k + m chunks. You can reconstruct the original data from any k of the n chunks. This means you can tolerate the loss of up to m chunks.
Durability Mathematics
Let’s derive the 11-nines figure. Assume:
- Annual disk failure rate (AFR) = 2% (industry standard for enterprise drives)
- Erasure code scheme: RS(10, 6) — 10 data + 6 parity = 16 total fragments
- Fragments spread across independent fault domains (different racks/AZs)
- Repair time: fragments on a failed disk are rebuilt within hours
Storage efficiency comparison:
| Scheme | Chunks | Tolerate Failures | Storage Overhead | Approx. Durability |
|---|---|---|---|---|
| Simple replication (3 copies) | 3 data | 2 | 3.0× | 99.9999% (6 nines) |
| RS(4, 2) | 4+2 = 6 | 2 | 1.5× | 99.99999% (7 nines) |
| RS(6, 3) | 6+3 = 9 | 3 | 1.5× | 99.999999% (8 nines) |
| RS(10, 6) | 10+6 = 16 | 6 | 1.6× | 99.999999999%+ (11+ nines) |
| RS(16, 4) | 16+4 = 20 | 4 | 1.25× | 99.9999999% (9 nines) |
▶ Object Storage Architecture — Erasure Coding Flow
Watch how a client upload is split into data chunks, encoded with parity, and distributed across storage nodes. See how the system survives node failures.
Storage Classes
S3 offers a tiered storage model that lets you optimize cost based on access frequency. Each class differs in pricing, retrieval latency, minimum storage duration, and availability SLA.
| Storage Class | Use Case | $/GB/mo | Retrieval | Min Duration | Availability |
|---|---|---|---|---|---|
| S3 Standard | Frequently accessed data | $0.023 | Instant (ms) | None | 99.99% |
| S3 Intelligent-Tiering | Unknown/changing access patterns | $0.023–0.004 | Instant (ms) | None | 99.9% |
| S3 Standard-IA | Infrequent access, rapid retrieval | $0.0125 | Instant (ms) | 30 days | 99.9% |
| S3 One Zone-IA | Re-creatable infrequent data | $0.01 | Instant (ms) | 30 days | 99.5% |
| S3 Glacier Instant | Archive with instant access | $0.004 | Instant (ms) | 90 days | 99.9% |
| S3 Glacier Flexible | Archive, minutes-to-hours retrieval | $0.0036 | 1–12 hours | 90 days | 99.9% |
| S3 Glacier Deep Archive | Long-term archive, rare access | $0.00099 | 12–48 hours | 180 days | 99.9% |
Real-World Cost Calculation
Consider a media company storing 500 TB of video assets with different access patterns:
▶ Storage Class Lifecycle — Cost vs Access Trade-Off
See how an object transitions through storage tiers over time, trading access speed for lower cost at each stage.
Multipart Upload
For objects larger than 100 MB (required for objects > 5 GB), S3 provides multipart upload — a three-phase protocol that dramatically improves reliability and throughput for large objects.
How It Works
- Initiate — create a multipart upload session, receive an
UploadId - Upload Parts — upload each part (5 MB–5 GB each, up to 10,000 parts), in parallel if desired
- Complete — send the list of part numbers and ETags to finalize the object
# Phase 1: Initiate multipart upload
aws s3api create-multipart-upload \
--bucket my-data-lake \
--key "backups/db-snapshot-2024-03-15.tar.gz" \
--storage-class STANDARD_IA \
--server-side-encryption "aws:kms"
# Response: { "UploadId": "abc123...", "Bucket": "my-data-lake", "Key": "..." }
# Phase 2: Upload parts (can be parallel!)
# Split a 10 GB file into 100 MB parts:
split -b 100M db-snapshot.tar.gz part-
# Upload each part (can run in parallel with GNU parallel or xargs):
for i in $(seq 1 100); do
aws s3api upload-part \
--bucket my-data-lake \
--key "backups/db-snapshot-2024-03-15.tar.gz" \
--upload-id "abc123..." \
--part-number $i \
--body "part-$(printf '%02d' $i)"
done
# Each returns: { "ETag": "\"etag-hash-here\"" }
# Phase 3: Complete multipart upload
aws s3api complete-multipart-upload \
--bucket my-data-lake \
--key "backups/db-snapshot-2024-03-15.tar.gz" \
--upload-id "abc123..." \
--multipart-upload '{
"Parts": [
{"PartNumber": 1, "ETag": "\"etag1\""},
{"PartNumber": 2, "ETag": "\"etag2\""},
...
{"PartNumber": 100, "ETag": "\"etag100\""}
]
}'
- Resilience — if one part fails, retry just that part (not the entire 10 GB upload)
- Parallelism — upload 8 parts simultaneously to saturate your bandwidth
- Pause/Resume — upload over hours or days; parts stay for 24 hours by default
- Throughput — smaller parts = more TCP connections = higher aggregate bandwidth
Optimal part size selection:
Pre-Signed URLs
Pre-signed URLs allow you to grant temporary, scoped access to private S3 objects without exposing your AWS credentials or making the bucket public. The URL itself contains a cryptographic signature.
# Generate a pre-signed URL for downloading (GET)
aws s3 presign s3://my-bucket/reports/q1-2024.pdf \
--expires-in 3600 # 1 hour
# Output:
# https://my-bucket.s3.amazonaws.com/reports/q1-2024.pdf
# ?X-Amz-Algorithm=AWS4-HMAC-SHA256
# &X-Amz-Credential=AKIA.../20240315/us-east-1/s3/aws4_request
# &X-Amz-Date=20240315T120000Z
# &X-Amz-Expires=3600
# &X-Amz-SignedHeaders=host
# &X-Amz-Signature=abc123...
# Generate a pre-signed URL for uploading (PUT)
import boto3
s3 = boto3.client('s3', region_name='us-east-1')
url = s3.generate_presigned_url(
'put_object',
Params={
'Bucket': 'user-uploads',
'Key': f'avatars/{user_id}.jpg',
'ContentType': 'image/jpeg',
'ContentLength': 5242880, # enforce 5 MB max
},
ExpiresIn=900 # 15 minutes
)
# Client uploads directly to S3, bypassing your server:
# curl -X PUT -H "Content-Type: image/jpeg" \
# --data-binary @avatar.jpg "$PRESIGNED_URL"
Common patterns for pre-signed URLs:
- Direct browser uploads — client-side JavaScript uploads to S3 via pre-signed PUT URL, avoiding proxying through your API server
- Temporary download links — generate time-limited links for paid content, SaaS exports, or sensitive reports
- Cross-account sharing — share an object with someone in a different AWS account without bucket policy changes
- CDN cache busting — pre-signed URLs with different signatures bust CloudFront cache
Versioning
S3 versioning keeps every version of every object. Once enabled on a bucket, it cannot be disabled — only suspended (new objects won’t get versions, but existing versions persist).
# Enable versioning
aws s3api put-bucket-versioning \
--bucket my-bucket \
--versioning-configuration Status=Enabled
# Upload same key twice:
aws s3 cp v1.txt s3://my-bucket/config.txt # VersionId: "aaa111"
aws s3 cp v2.txt s3://my-bucket/config.txt # VersionId: "bbb222"
# List all versions:
aws s3api list-object-versions --bucket my-bucket --prefix config.txt
# {
# "Versions": [
# { "Key": "config.txt", "VersionId": "bbb222", "IsLatest": true, "Size": 1024 },
# { "Key": "config.txt", "VersionId": "aaa111", "IsLatest": false, "Size": 512 }
# ]
# }
# Get a specific version:
aws s3api get-object --bucket my-bucket --key config.txt \
--version-id "aaa111" old-config.txt
# "Delete" an object (just adds a delete marker):
aws s3 rm s3://my-bucket/config.txt
# VersionId: "ccc333" (this is a delete marker)
# Object appears deleted, but old versions still exist!
# To truly delete, specify the version:
aws s3api delete-object --bucket my-bucket --key config.txt \
--version-id "aaa111" # permanently deletes this version
Versioning costs: Each version is a full copy, billed at the same rate. A 1 GB file overwritten 100 times = 100 GB stored. Use lifecycle policies to clean up old versions.
Lifecycle Policies
Lifecycle rules automate the transition and expiration of objects — the backbone of cost optimization in any S3-heavy architecture.
// lifecycle-rules.json — applied to bucket with:
// aws s3api put-bucket-lifecycle-configuration \
// --bucket my-data-lake --lifecycle-configuration file://lifecycle-rules.json
{
"Rules": [
{
"ID": "hot-to-warm-after-30d",
"Status": "Enabled",
"Filter": { "Prefix": "logs/" },
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER"
},
{
"Days": 365,
"StorageClass": "DEEP_ARCHIVE"
}
],
"Expiration": {
"Days": 2555 // delete after 7 years (compliance)
}
},
{
"ID": "cleanup-incomplete-uploads",
"Status": "Enabled",
"Filter": { "Prefix": "" },
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 7
}
},
{
"ID": "expire-old-versions",
"Status": "Enabled",
"Filter": { "Prefix": "" },
"NoncurrentVersionTransitions": [
{
"NoncurrentDays": 30,
"StorageClass": "GLACIER"
}
],
"NoncurrentVersionExpiration": {
"NoncurrentDays": 365
}
},
{
"ID": "delete-expired-markers",
"Status": "Enabled",
"Filter": { "Prefix": "" },
"Expiration": {
"ExpiredObjectDeleteMarker": true
}
}
]
}
AbortIncompleteMultipartUpload rule above is essential — we’ve seen companies paying thousands per month for orphaned multipart fragments they didn’t know existed.
Event Notifications (S3 → SNS/SQS/Lambda)
S3 can emit events when objects are created, deleted, restored, or replicated. This is the foundation of event-driven architectures built on object storage.
Event Types
# Supported events:
s3:ObjectCreated:* # any create (Put, Post, Copy, CompleteMultipartUpload)
s3:ObjectCreated:Put # specific PUT
s3:ObjectCreated:Post
s3:ObjectCreated:Copy
s3:ObjectCreated:CompleteMultipartUpload
s3:ObjectRemoved:* # any delete
s3:ObjectRemoved:Delete
s3:ObjectRemoved:DeleteMarkerCreated
s3:ObjectRestore:Post # Glacier restore initiated
s3:ObjectRestore:Completed # Glacier restore completed
s3:Replication:* # cross-region replication events
s3:LifecycleTransition # object transitioned between storage classes
s3:IntelligentTiering # automatic tier change
Notification Targets & Patterns
# Notification configuration (via AWS CLI)
aws s3api put-bucket-notification-configuration \
--bucket media-uploads \
--notification-configuration '{
"LambdaFunctionConfigurations": [
{
"Id": "thumbnail-generator",
"LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456:function:gen-thumb",
"Events": ["s3:ObjectCreated:*"],
"Filter": {
"Key": {
"FilterRules": [
{"Name": "prefix", "Value": "uploads/images/"},
{"Name": "suffix", "Value": ".jpg"}
]
}
}
}
],
"QueueConfigurations": [
{
"Id": "transcode-queue",
"QueueArn": "arn:aws:sqs:us-east-1:123456:video-transcode-queue",
"Events": ["s3:ObjectCreated:*"],
"Filter": {
"Key": {
"FilterRules": [
{"Name": "prefix", "Value": "uploads/video/"}
]
}
}
}
],
"TopicConfigurations": [
{
"Id": "audit-trail",
"TopicArn": "arn:aws:sns:us-east-1:123456:s3-audit-topic",
"Events": ["s3:ObjectRemoved:*"]
}
]
}'
Event-driven pipeline example:
User uploads video to
uploads/video/clip.mp4Decouples, buffers burst uploads, provides retry
Transcodes to HLS (720p, 1080p, 4K)
Processed files to
processed/video/clip/Notify CDN + update database catalog
Content-Addressed Storage
Content-addressed storage (CAS) identifies objects by the cryptographic hash of their content rather than a user-assigned name. This creates a natural deduplication layer and provides integrity verification “for free.”
# Content-addressed key = hash of the content
import hashlib
def store_content_addressed(s3_client, bucket, data: bytes) -> str:
"""Store data using SHA-256 hash as the key."""
content_hash = hashlib.sha256(data).hexdigest()
key = f"cas/{content_hash[:2]}/{content_hash[2:4]}/{content_hash}"
# ↑ Two-level prefix to avoid hot partitions
# Check if already exists (dedup!)
try:
s3_client.head_object(Bucket=bucket, Key=key)
return key # already stored, skip upload
except s3_client.exceptions.ClientError:
pass # doesn't exist, upload it
s3_client.put_object(
Bucket=bucket,
Key=key,
Body=data,
ContentType='application/octet-stream',
Metadata={
'content-hash': content_hash,
'hash-algorithm': 'sha256'
}
)
return key
# Usage:
key = store_content_addressed(s3, 'my-cas-bucket', video_bytes)
# key = "cas/a3/b1/a3b1c2d3e4f5...64-char-hex"
# Verification on read:
obj = s3.get_object(Bucket='my-cas-bucket', Key=key)
data = obj['Body'].read()
assert hashlib.sha256(data).hexdigest() == key.split('/')[-1]
# ↑ integrity guaranteed — if hash matches, data is uncorrupted
Where CAS is used:
- Git — every blob, tree, and commit is stored by its SHA-1 (now SHA-256) hash
- Docker/OCI images — each layer is a content-addressed blob in a registry
- IPFS — distributed file system where files are addressed by CID (content identifier)
- Backup systems — Restic, Borg use CAS for deduplication across backups
- Data lakes — Apache Iceberg uses content hashing for snapshot manifests
-N (number of parts). The x-amz-checksum-sha256 header (added in 2022) provides true content addressing.
S3 API: Key Operations
S3’s REST API is the de facto standard for object storage — virtually every cloud provider and open-source alternative implements this API.
# ──────── CRUD Operations ────────
# PUT Object
PUT /my-key HTTP/1.1
Host: my-bucket.s3.amazonaws.com
Content-Type: application/json
Content-Length: 1024
x-amz-storage-class: STANDARD_IA
x-amz-server-side-encryption: AES256
x-amz-meta-custom-field: my-value
Authorization: AWS4-HMAC-SHA256 Credential=...
{"data": "..."}
# GET Object
GET /my-key HTTP/1.1
Host: my-bucket.s3.amazonaws.com
Range: bytes=0-1048575 # partial read (first 1 MB)
# HEAD Object (metadata only, no data transfer)
HEAD /my-key HTTP/1.1
Host: my-bucket.s3.amazonaws.com
# Returns: Content-Length, Content-Type, ETag, Last-Modified, x-amz-meta-*
# DELETE Object
DELETE /my-key HTTP/1.1
Host: my-bucket.s3.amazonaws.com
# ──────── LIST Operations ────────
# List objects (v2, paginated)
GET /?list-type=2&prefix=photos/2024/&delimiter=/&max-keys=1000 HTTP/1.1
Host: my-bucket.s3.amazonaws.com
# Returns: CommonPrefixes (simulated "directories") + Contents (objects)
# ──────── COPY Object (server-side, no download) ────────
PUT /destination-key HTTP/1.1
Host: dest-bucket.s3.amazonaws.com
x-amz-copy-source: source-bucket/source-key
# ──────── Batch Operations ────────
# S3 Batch Operations can process billions of objects:
# - Copy objects between buckets
# - Set tags, ACLs, or metadata
# - Invoke Lambda per object
# - Restore from Glacier
Performance Limits
| Metric | Limit | Notes |
|---|---|---|
| GET requests per prefix | 5,500/sec | Per prefix per partition |
| PUT/POST/DELETE per prefix | 3,500/sec | Per prefix per partition |
| Max object size | 5 TB | Via multipart upload |
| Max single PUT size | 5 GB | Use multipart for larger |
| Max metadata per object | 2 KB | User-defined key-value pairs |
| Max parts per multipart | 10,000 | Part size: 5 MB–5 GB |
| Max buckets per account | 100 (soft) | Raisable to 1,000 |
| Max objects per bucket | Unlimited | Billions in production |
a3b1/data.json instead of sequential keys like 2024-03-15/data.json). Since 2018, S3 handles this automatically for most workloads — the “randomize prefix” advice is largely outdated but still relevant for extreme throughput.
MinIO: Self-Hosted S3-Compatible Storage
MinIO is the leading open-source, S3-compatible object storage server. It implements the full S3 API, meaning any application built for S3 can switch to MinIO with zero code changes — just change the endpoint URL.
Architecture
# MinIO can run as a single binary or distributed cluster
# Single-node (development):
minio server /data --console-address ":9001"
# Exposes S3 API on :9000, web console on :9001
# Distributed mode (production) — 4 nodes × 4 disks each = 16 drives:
# Node 1:
minio server http://node{1...4}:9000/mnt/disk{1...4}/data
# This creates an erasure-coded cluster:
# - RS(8,8) by default: 8 data + 8 parity shards per object
# - Tolerates loss of up to 8 drives (half the cluster)
# - Storage efficiency: 50% usable (compared to 33% with 3× replication)
# More efficient for larger clusters — 16 nodes × 4 disks = 64 drives:
minio server http://node{1...16}:9000/mnt/disk{1...4}/data
# RS(32,32): tolerates 32 drive failures
Usage with Standard S3 SDKs
# Python — boto3 works with MinIO by changing endpoint only:
import boto3
s3 = boto3.client('s3',
endpoint_url='http://minio.internal:9000',
aws_access_key_id='minioadmin',
aws_secret_access_key='minioadmin',
region_name='us-east-1' # required but not used
)
# All standard S3 operations work:
s3.create_bucket(Bucket='my-app-data')
s3.put_object(Bucket='my-app-data', Key='users/123.json', Body=b'{"name":"Alice"}')
obj = s3.get_object(Bucket='my-app-data', Key='users/123.json')
print(obj['Body'].read()) # b'{"name":"Alice"}'
# Go — MinIO's own SDK:
// import "github.com/minio/minio-go/v7"
client, _ := minio.New("minio.internal:9000", &minio.Options{
Creds: credentials.NewStaticV4("minioadmin", "minioadmin", ""),
Secure: false,
})
client.PutObject(ctx, "my-app-data", "users/123.json",
strings.NewReader(`{"name":"Alice"}`), -1,
minio.PutObjectOptions{ContentType: "application/json"})
# Kubernetes deployment (Helm):
helm repo add minio https://charts.min.io/
helm install minio minio/minio \
--set replicas=4 \
--set persistence.size=1Ti \
--set resources.requests.memory=16Gi
When to use MinIO over S3:
- Data sovereignty — data must stay on-premises or in specific jurisdictions
- Cost at scale — 10+ PB is cheaper on own hardware ($0.005/GB/mo vs S3’s $0.023)
- Low-latency access — co-locate storage with compute (same rack, <1ms vs 50ms S3 latency)
- Air-gapped environments — no internet connectivity (defense, healthcare)
- Development/testing — local S3-compatible server for integration tests
Design Patterns & Best Practices
Key Design Patterns
# ❌ Bad: Sequential keys create hot partitions
2024-03-15-000001.json
2024-03-15-000002.json
2024-03-15-000003.json
# All requests hit the same partition → throttling at high throughput
# ✅ Good: Hash-prefixed keys distribute across partitions
a3b1/2024-03-15-000001.json # hash of timestamp or UUID prefix
7f2e/2024-03-15-000002.json
c9d4/2024-03-15-000003.json
# ✅ Better: Use natural high-cardinality prefixes
users/a3b1c2d3/profile.json # user ID as prefix
events/2024/03/15/14/30/evt-uuid.json # hour-level partitioning
# ✅ Best (modern S3): Use random UUIDs as keys
550e8400-e29b-41d4-a716-446655440000.parquet
# S3 auto-partitions since 2018, but UUIDs still help at extreme scale
Security Patterns
# 1. Bucket policy: deny unencrypted uploads
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyUnencryptedUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "aws:kms"
}
}
},
{
"Sid": "DenyHTTP",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"],
"Condition": {
"Bool": { "aws:SecureTransport": "false" }
}
}
]
}
# 2. Enable S3 Block Public Access (account-level)
aws s3control put-public-access-block \
--account-id 123456789012 \
--public-access-block-configuration \
"BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
# 3. Enable access logging
aws s3api put-bucket-logging --bucket my-bucket \
--bucket-logging-status '{
"LoggingEnabled": {
"TargetBucket": "my-access-logs",
"TargetPrefix": "s3-logs/my-bucket/"
}
}'
Summary
- Object storage uses a flat namespace (bucket + key) and HTTP API — trades random-write/low-latency for unlimited scale and 11-nines durability.
- S3 architecture has four layers: front-end (REST), index (metadata), placement, and storage (erasure-coded chunks on disk).
- Strong read-after-write consistency (since Dec 2020) — no more stale reads for overwrites or deletes.
- Erasure coding RS(10,6) achieves 11+ nines durability at only 1.6× storage overhead, vs 3.0× for triple replication.
- Storage classes (Standard → IA → Glacier → Deep Archive) can reduce costs by 80%+ with lifecycle policies.
- Multipart upload enables parallel, resumable uploads for objects up to 5 TB (10,000 parts × 5 GB).
- Pre-signed URLs provide time-limited, credential-free access — essential for direct client uploads.
- Event notifications (S3 → Lambda/SQS/SNS/EventBridge) power event-driven pipelines for media processing, ETL, and auditing.
- Content-addressed storage (CAS) uses content hashes as keys for natural deduplication and integrity verification.
- MinIO provides S3-compatible storage on your own hardware — zero code changes, just swap the endpoint URL.