#Design E-Commerce Product Listing (Amazon-Scale)
#1. Problem Statement & Clarifications
#Functional Requirements
- Product Catalog β Sellers list products with title, description, images, price, variants (size/color), and category
- Search & Browse β Users search by keyword, filter by category/price/rating/brand, sort results
- Product Detail Page (PDP) β Display full product info, seller details, reviews, related products
- Inventory Tracking β Real-time stock availability per variant per warehouse
- Pricing Engine β Dynamic pricing, deals, coupons, seller-specific pricing
#Non-Functional Requirements
| Requirement | Target |
|---|---|
| Read Latency | P99 < 200ms for search, < 100ms for PDP |
| Write Latency | Product updates reflected in search < 30s |
| Availability | 99.99% β downtime = lost revenue |
| Scale | 500M products, 1B daily searches, 100M DAU |
| Consistency | Eventually consistent for catalog; strong for inventory/pricing |
#Out of Scope
- Cart, checkout, payment processing (separate system)
- Order management, shipping, returns
- Seller onboarding & KYC
- Recommendation ML models (we consume their output)
#Assumptions
- Multi-tenant: millions of sellers, one platform
- Global deployment across 3+ regions
- Products have 1β50 variants each
- Average product has 5 images
#2. Back-of-Envelope Estimation
#Traffic Estimates
DAU: 100M users
Searches/day: 1B (10 searches/user avg)
PDP views/day: 2B (users click ~2 results per search)
Search QPS: 1B / 86400 β 12K QPS (avg), 36K QPS (peak 3x)
PDP QPS: 2B / 86400 β 23K QPS (avg), 70K QPS (peak)
Product writes: 5M/day (new listings + updates) β 60 QPS#Storage Estimates
Products: 500M
Avg product size: ~10KB (metadata + description)
Product metadata: 500M Γ 10KB = 5TB
Images: 500M Γ 5 images Γ 500KB = 1.25PB (object storage)
Search index: ~2TB (inverted index + facets)
Reviews: ~3TB
Total: ~1.3PB (dominated by images)#Bandwidth
Search response: ~50KB (20 results with thumbnails)
Peak search BW: 36K Γ 50KB = 1.8 GB/s
PDP response: ~200KB (images served separately via CDN)
Peak PDP BW: 70K Γ 200KB = 14 GB/s
Image CDN BW: ~50 GB/s (offloaded to CDN)#Cache Estimates
Top 20% products = 100M products (Pareto principle)
Cache size: 100M Γ 10KB = 1TB (distributed Redis cluster)
Cache hit ratio: ~95% for popular products#3. API Design
#Search API
GET /api/v1/products/search
?q=wireless+headphones
&category=electronics
&price_min=20&price_max=200
&brand=sony,bose
&rating_min=4
&sort=relevance|price_asc|price_desc|rating|newest
&page=1&page_size=20
Response 200:
{
"results": [
{
"product_id": "P123",
"title": "Sony WH-1000XM5",
"price": { "amount": 349.99, "currency": "USD", "was": 399.99 },
"rating": { "avg": 4.7, "count": 12453 },
"thumbnail_url": "https://cdn.example.com/...",
"badges": ["Best Seller", "Prime"],
"in_stock": true
}
],
"facets": {
"brands": [{"name":"Sony","count":234}, ...],
"price_ranges": [{"range":"$0-$50","count":89}, ...]
},
"total": 1847,
"page": 1
}#Product Detail API
GET /api/v1/products/{product_id}
Response 200:
{
"product_id": "P123",
"title": "Sony WH-1000XM5",
"description": "...",
"category_path": ["Electronics", "Audio", "Headphones"],
"variants": [
{ "variant_id": "V1", "color": "Black", "price": 349.99,
"inventory": { "status": "IN_STOCK", "quantity_hint": "50+" },
"images": ["url1", "url2"] }
],
"seller": { "id": "S456", "name": "Sony Official", "rating": 4.9 },
"reviews_summary": { "avg": 4.7, "count": 12453, "distribution": {5:8200,4:2800,3:900,2:350,1:203} },
"related_products": ["P789", "P012"]
}#Product Write API (Seller-facing)
POST /api/v1/seller/products
PUT /api/v1/seller/products/{product_id}
Body:
{
"title": "...",
"description": "...",
"category_id": "CAT_123",
"variants": [...],
"images": [<multipart uploads>]
}#Inventory Update API (Internal)
PATCH /api/v1/inventory/{variant_id}
{
"warehouse_id": "WH_01",
"quantity_delta": -1,
"operation": "RESERVE|RELEASE|DEDUCT"
}#4. Data Model
#Product Catalog (PostgreSQL β sharded by product_id)
-- Core product (immutable-ish, versioned)
CREATE TABLE products (
product_id BIGINT PRIMARY KEY, -- Snowflake ID
seller_id BIGINT NOT NULL,
title VARCHAR(500),
description TEXT,
category_id INT,
brand VARCHAR(200),
status ENUM('DRAFT','ACTIVE','INACTIVE','DELETED'),
created_at TIMESTAMP,
updated_at TIMESTAMP,
version INT DEFAULT 1
);
-- Variants (size/color combos)
CREATE TABLE product_variants (
variant_id BIGINT PRIMARY KEY,
product_id BIGINT REFERENCES products,
sku VARCHAR(50) UNIQUE,
attributes JSONB, -- {"color":"Black","size":"M"}
price_cents BIGINT,
compare_price BIGINT, -- strikethrough price
weight_grams INT,
status ENUM('ACTIVE','INACTIVE')
);
-- Images
CREATE TABLE product_images (
image_id BIGINT PRIMARY KEY,
product_id BIGINT,
variant_id BIGINT NULL, -- NULL = applies to all variants
url VARCHAR(500),
position SMALLINT, -- display order
alt_text VARCHAR(300)
);#Inventory (DynamoDB / Cassandra β high write throughput)
Partition Key: variant_id
Sort Key: warehouse_id
{
variant_id: "V1",
warehouse_id: "WH_01",
available: 142,
reserved: 18,
version: 347 // optimistic concurrency
}#Search Index (Elasticsearch)
{
"product_id": "P123",
"title": "Sony WH-1000XM5 Wireless Headphones",
"title_ngram": "...",
"description": "...",
"category_path": ["Electronics", "Audio", "Headphones"],
"brand": "Sony",
"price": 349.99,
"rating_avg": 4.7,
"rating_count": 12453,
"in_stock": true,
"badges": ["Best Seller"],
"seller_id": "S456",
"attributes": { "color": ["Black","Silver"], "connectivity": "Bluetooth" },
"created_at": "2025-01-15T00:00:00Z"
}#Access Patterns
| Query | Store | Index/Key |
|---|---|---|
| Search by keyword + filters | Elasticsearch | Full-text + filters |
| Get product by ID | PostgreSQL + Redis cache | PK: product_id |
| Get variants for product | PostgreSQL | IX: product_id |
| Check inventory | DynamoDB | PK: variant_id, SK: warehouse_id |
| Products by seller | PostgreSQL | IX: seller_id |
| Products by category | Elasticsearch | Term filter |
#5. High-Level Design (HLD) & Scale Evolution
A system like this is rarely built at "Amazon-scale" from day one. Here is how the architecture evolves as traffic and data grow.
#Stage 1: MVP / Startup Scale (10K DAU, 1M Products)
At this scale, simplicity and speed of iteration are prioritized over massive horizontal scalability.
Architecture:
- Single Monolith App: Handles search, catalog, and inventory.
- Relational DB (PostgreSQL): Stores products, inventory, and handles full-text search (using
pg_trgmor built-in text search). - No Cache / Message Queue: Direct DB queries for all operations.
Bottlenecks that force evolution:
- Search queries become too slow and inaccurate as the catalog grows.
- The single database becomes a single point of failure and read/write bottleneck.
- Product Details Page (PDP) load times increase due to complex DB joins.
#Stage 2: Growth Scale (1M DAU, 50M Products)
We introduce specialized data stores and caching to handle read-heavy traffic and improve search relevance.
Key Architectural Changes:
- Introduce Elasticsearch: Offload search from PostgreSQL to a dedicated search engine.
- Introduce Redis: Cache PDP responses, session data, and popular search queries.
- Microservices Split: Separate Search Service and Catalog/Inventory Service to scale independently.
- CDN: Serve all static images via CloudFront instead of application servers.
Bottlenecks that force evolution:
- Write contention on inventory during flash sales (DB locks degrade performance).
- Rebuilding the search index synchronously impacts catalog write latency.
- Single PostgreSQL database can no longer hold all product data efficiently without sharding.
#Stage 3: Amazon-Scale (100M DAU, 500M Products)
This is the target architecture defined in the requirements. We move to a fully decoupled, event-driven architecture with CQRS.
#Architecture Diagram (Amazon-Scale)
βββββββββββββββ
β [CDN](../building-blocks/cdn.md) β β Images, static assets
β (CloudFront)β
ββββββββ¬βββββββ
β
ββββββββββββΌβββββββββββ
β API Gateway β
β (Rate limit, Auth, β
β Request routing) β
ββββββββββββ¬βββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββ βββββββββββββββββ
β Search Service β β Catalog Serviceβ β Inventory Svc β
β β β β β β
β β’ Query parsing β β β’ CRUD productsβ β β’ Stock check β
β β’ Spell correct β β β’ Variant mgmt β β β’ Reserve β
β β’ Faceting β β β’ Image upload β β β’ Deduct β
β β’ Ranking β β β’ Validation β β β
ββββββββββ¬ββββββββββ βββββββββ¬βββββββββ βββββββββ¬ββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββ βββββββββββββββββ
β Elasticsearch β β PostgreSQL β β DynamoDB β
β Cluster β β (Sharded) β β β
βββββββββββββββββββ βββββββββ¬βββββββββ βββββββββββββββββ
β
βββββββββββΌββββββββββ
β Redis Cache β
β (Product + Price) β
βββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Event Bus (Kafka) β
β Topics: product.updated | inventory.changed | β
β price.changed | index.rebuild β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββΌβββββββββββ
β Index Builder Svc β
β (Kafka Consumer) β
β Denormalizes data β β
β Writes to ES β
βββββββββββββββββββββββ#Component Breakdown
| Component | Responsibility | Tech Choice |
|---|---|---|
| API Gateway | Load balancing, rate limiting, auth | Kong / AWS API GW |
| Search Service | Query parsing, spell-check, ranking | Custom + Elasticsearch |
| Catalog Service | Product CRUD, validation | Java/Go microservice |
| Inventory Service | Stock management with optimistic locking | Go microservice |
| Pricing Service | Dynamic pricing, deals, coupons | Separate service |
| Index Builder | Denormalize catalog+inventory+pricing β ES | Kafka consumer |
| Image Service | Upload, resize, optimize, CDN invalidation | S3 + Lambda |
#Data Flow
Write Path (Seller lists a product):
Seller β API GW β Catalog Service β PostgreSQL (write)
β Kafka (product.created event)
β Index Builder consumes event
β Joins with inventory + pricing data
β Writes to Elasticsearch
β Invalidates Redis cacheRead Path (User searches):
User β API GW β Search Service β Elasticsearch (query)
β Hydrate with Redis cache (prices, stock)
β Return ranked resultsRead Path (User views PDP):
User β API GW β Catalog Service β Redis cache (hit? return)
β Cache miss β PostgreSQL
β Inventory Service β DynamoDB (stock)
β Assemble response β Cache in Redis#6. Deep Dive β Core Components
#Search Service β Detailed Design
Query Pipeline:
Raw Query β Tokenize β Spell Correct β Synonym Expand
β Category Predict β Build ES Query β Execute
β Re-rank (ML) β Hydrate prices/stock β ReturnElasticsearch Query Strategy:
boolquery withmust(keyword match) +filter(category, price, brand)function_scoreto boost by: relevance, sales velocity, rating, recency- Aggregations for facet counts (brand, price ranges, ratings)
Relevance Ranking Factors:
Score = TextRelevance Γ 0.3
+ SalesVelocity Γ 0.25
+ Rating Γ 0.2
+ SellerQuality Γ 0.1
+ Recency Γ 0.05
+ StockAvailability Γ 0.1Spell Correction: Use ES suggest API with phrase suggester + custom dictionary from product catalog.
#Catalog Service β Detailed Design
Product Versioning: Every update creates a new version. This enables:
- Rollback on bad seller edits
- Audit trail for pricing disputes
- Safe async index rebuilding from any version
Image Pipeline:
Upload β Virus Scan β Store Original in S3
β Async: Generate thumbnails (150px, 300px, 600px, 1200px)
β Store variants in S3
β Update CDN mappings
β Return CDN URLs#Inventory Service β Detailed Design
Why DynamoDB? Single-digit ms latency, auto-scaling, built-in optimistic concurrency via conditional writes.
Reservation Pattern (for cart/checkout):
1. User adds to cart β RESERVE (available--, reserved++)
2. Checkout success β DEDUCT (reserved--)
3. Cart timeout 15m β RELEASE (available++, reserved--)Conditional Write (prevents overselling):
UpdateItem:
SET available = available - 1, reserved = reserved + 1, version = version + 1
CONDITION: available > 0 AND version = :expected_version#Scaling Strategy
| Component | Strategy |
|---|---|
| PostgreSQL | Shard by product_id (hash-based, 256 shards). Read replicas per shard for read-heavy PDP traffic |
| Elasticsearch | 50+ shards across nodes; time-based indices for new products, merged monthly |
| Redis | Cluster mode, 1TB+ across 50+ nodes; LRU eviction; separate cluster for sessions vs product cache |
| DynamoDB | Auto-scaling with on-demand mode for flash sales |
| Search Service | Horizontal scale; stateless pods behind LB |
| Index Builder | Scale Kafka consumer group partitions; batch writes to ES |
#Caching Strategy
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 1: CDN (CloudFront) β
β β’ Images, category pages (TTL: 24h) β
β β’ Search results for popular queries (TTL: 5m) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 2: Application Cache (Redis) β
β β’ Product metadata (TTL: 10m, invalidate on update) β
β β’ Price snapshots (TTL: 1m β prices change frequently) β
β β’ Search facet counts (TTL: 5m) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 3: Local Cache (in-process) β
β β’ Category tree (TTL: 1h β rarely changes) β
β β’ Feature flags, config (TTL: 30s) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββCache Invalidation: Event-driven via Kafka. When product/price/inventory changes β publish event β cache invalidation consumer deletes Redis keys.
#Consistency Models
| Data | Model | Rationale |
|---|---|---|
| Product metadata | Eventually consistent (< 30s) | Stale title/description is acceptable briefly |
| Price | Eventually consistent (< 5s) | Wrong price = legal/trust issue; keep lag minimal |
| Inventory | Strongly consistent | Overselling = terrible UX; use conditional writes |
| Search index | Eventually consistent (< 30s) | Slight delay in searchability is acceptable |
#7. Low-Level Design β Core Functionality
#Key Algorithms
1. Search Query Parser
Input: "sony headphones under $200 with 4+ stars"
Output: {
keywords: ["sony", "headphones"],
filters: { price_max: 200, rating_min: 4, brand: "Sony" },
intent: "product_search"
}Uses NLP-based intent detection to extract structured filters from natural language.
2. Inventory Reservation (Optimistic Locking)
def reserve_stock(variant_id, warehouse_id, quantity):
item = dynamodb.get_item(variant_id, warehouse_id)
if item.available < quantity:
raise OutOfStockError()
try:
dynamodb.update_item(
key=(variant_id, warehouse_id),
update="SET available = available - :qty, reserved = reserved + :qty, version = version + 1",
condition="version = :current_version AND available >= :qty",
values={":qty": quantity, ":current_version": item.version}
)
except ConditionalCheckFailed:
retry_with_backoff() # Another request modified stock3. Index Builder (CQRS Pattern)
Kafka Event (product.updated) β
1. Fetch latest product from PostgreSQL
2. Fetch current price from Pricing Service
3. Fetch inventory status from DynamoDB
4. Denormalize into search document
5. Upsert into Elasticsearch
6. Invalidate Redis cache for product_id#Design Patterns Used
| Pattern | Where | Why |
|---|---|---|
| CQRS | Catalog (write) vs Search (read) | Optimize read and write paths independently |
| Event Sourcing | Product updates via Kafka | Decoupled index building, audit trail |
| Saga | Inventory reservation across checkout | No distributed transactions needed |
| Circuit Breaker | Search β ES, Catalog β DB | Prevent cascade failures |
| Bulkhead | Separate thread pools per downstream | Isolate failures |
| Cache-Aside | PDP reads | Most common; simple invalidation |
#8. Failure Handling & Edge Cases
#What Happens When X Fails?
| Failure | Impact | Mitigation |
|---|---|---|
| Elasticsearch down | Search broken | Fallback to category browsing from PostgreSQL; pre-cached popular searches |
| PostgreSQL shard down | PDP for subset of products unavailable | Read replicas auto-failover; cache serves stale data |
| Redis down | Latency spike | Direct DB reads; local cache as L1 fallback |
| Kafka lag | Search index stale | Alert on lag > 60s; fallback to direct DB polling |
| DynamoDB throttle | Inventory checks slow | On-demand auto-scaling; queue overflow to SQS |
| CDN origin failure | Images don't load | Multi-origin failover; placeholder images |
#Flash Sale / Traffic Spike Handling
1. Pre-warm caches for sale products (30 min before)
2. Pre-scale search + catalog service pods
3. Rate limit non-essential APIs (reviews, recommendations)
4. Inventory: switch to queue-based reservation to absorb burst
5. Show "Almost Gone!" when stock < threshold (creates urgency + reduces oversell)#Edge Cases
- Seller uploads 10,000 products at once β Rate limit to 100/min per seller; async bulk import via SQS
- Price shows $0 due to bug β Price validation layer: reject prices outside Β±50% of historical range
- Same product listed by 100 sellers β Canonical product matching; show "Buy Box" winner
- Search returns 0 results β Relax filters progressively; suggest corrections; show related categories
#9. Monitoring & Observability
#Key Metrics
| Metric | Target | Alert Threshold |
|---|---|---|
| Search P99 latency | < 200ms | > 500ms |
| PDP P99 latency | < 100ms | > 300ms |
| Search zero-result rate | < 5% | > 10% |
| Cache hit ratio | > 95% | < 85% |
| Inventory accuracy | > 99.9% | < 99% |
| Index lag (Kafka β ES) | < 30s | > 120s |
| Error rate (5xx) | < 0.1% | > 0.5% |
#Alerting Strategy
- P0 (Page immediately): Search service down, inventory overselling, payment-price mismatch
- P1 (15 min response): Cache hit ratio drop, ES index lag > 2 min, error rate spike
- P2 (Next business day): Storage approaching limits, slow query patterns, stale product count increasing
#SLAs / SLOs
Search API: 99.95% availability, P99 < 200ms
PDP API: 99.99% availability, P99 < 100ms
Inventory API: 99.99% availability, P99 < 50ms
Index freshness: 99% of updates reflected in < 30s#10. Trade-off Summary
| Decision | Chose | Over | Because |
|---|---|---|---|
| Search engine | Elasticsearch | Solr, PostgreSQL FTS | Better scaling, richer aggregations, proven at scale |
| Catalog DB | PostgreSQL (sharded) | MongoDB | Relational integrity for products/variants; JSONB for flexible attributes |
| Inventory DB | DynamoDB | PostgreSQL, Redis | Single-digit ms latency + conditional writes; auto-scaling for flash sales |
| Read/Write separation | CQRS via Kafka | Single DB for both | Read (search) and write (catalog) have vastly different patterns and scale |
| Cache invalidation | Event-driven (Kafka) | TTL-only | Faster propagation for price/stock changes; TTL as safety net |
| Image storage | S3 + CDN | Self-hosted | Cost-effective at PB scale; global CDN for low latency |
| Consistency for inventory | Strong (conditional writes) | Eventual | Overselling is worse than occasional "out of stock" error |
| Consistency for search | Eventual (< 30s lag) | Strong | Users tolerate slight delay in searchability |
#11. Extensions & Follow-ups
#What Would You Add With More Time?
- Personalized Search Ranking β ML model using user purchase history, browsing behavior
- Visual Search β Upload image β find similar products (CNN embeddings + ANN search)
- Real-time Price Comparison β Aggregate prices across sellers with live bidding for "Buy Box"
- Multi-region Active-Active β CRDTs for catalog data; regional ES clusters with cross-replication
- A/B Testing Framework β Test ranking algorithms, UI layouts, pricing strategies
- Fraud Detection β Flag fake reviews, counterfeit products, price manipulation
#How Would This Change at 100x Scale?
- 50B products: Tiered storage β hot products in SSD-backed ES, cold in S3-backed archive with lazy indexing
- 100B searches/day: Precomputed search results for top 10K queries; edge compute for personalization
- Inventory: Move to event-sourced log (Kafka Streams) with materialized views per warehouse region
- Search: Custom search engine (like Amazon's A9) replacing Elasticsearch for full control over ranking
#12. Cross-References
#Related Topics in This Repo
| Topic | Connection |
|---|---|
| Rate Limiter (#2) | API Gateway rate limiting for sellers and search abuse |
| Unique ID Generator (#4) | Snowflake IDs for product_id, variant_id |
| Notification System (#13) | Price drop alerts, back-in-stock notifications |
| Search Autocomplete (#14) | Typeahead for product search bar |
| Distributed Cache (#15) | Redis cluster design for product cache |
| Payment System (#18) | Checkout integration with price verification |
#Building Blocks Used
- Load Balancer β L7 for API routing
- CDN β Image delivery, static search pages
- PostgreSQL β Product catalog (sharded)
- Elasticsearch β Search index with faceting
- DynamoDB β Inventory with conditional writes
- Redis β Multi-layer caching
- Kafka β Event bus for CQRS, cache invalidation
- S3 β Image/media object storage