#Design E-Commerce Product Listing (Amazon-Scale)

#1. Problem Statement & Clarifications

#Functional Requirements

Product Catalog — Sellers list products with title, description, images, price, variants (size/color), and category
Search & Browse — Users search by keyword, filter by category/price/rating/brand, sort results
Product Detail Page (PDP) — Display full product info, seller details, reviews, related products
Inventory Tracking — Real-time stock availability per variant per warehouse
Pricing Engine — Dynamic pricing, deals, coupons, seller-specific pricing

#Non-Functional Requirements

Requirement	Target
Read Latency	P99 < 200ms for search, < 100ms for PDP
Write Latency	Product updates reflected in search < 30s
Availability	99.99% — downtime = lost revenue
Scale	500M products, 1B daily searches, 100M DAU
Consistency	Eventually consistent for catalog; strong for inventory/pricing

#Out of Scope

Cart, checkout, payment processing (separate system)
Order management, shipping, returns
Seller onboarding & KYC
Recommendation ML models (we consume their output)

#Assumptions

Multi-tenant: millions of sellers, one platform
Global deployment across 3+ regions
Products have 1–50 variants each
Average product has 5 images

#2. Back-of-Envelope Estimation

#Traffic Estimates

DAU:              100M users
Searches/day:     1B (10 searches/user avg)
PDP views/day:    2B (users click ~2 results per search)
Search QPS:       1B / 86400 ≈ 12K QPS (avg), 36K QPS (peak 3x)
PDP QPS:          2B / 86400 ≈ 23K QPS (avg), 70K QPS (peak)
Product writes:   5M/day (new listings + updates) ≈ 60 QPS

#Storage Estimates

Products:         500M
Avg product size: ~10KB (metadata + description)
Product metadata: 500M × 10KB = 5TB
Images:           500M × 5 images × 500KB = 1.25PB (object storage)
Search index:     ~2TB (inverted index + facets)
Reviews:          ~3TB
Total:            ~1.3PB (dominated by images)

#Bandwidth

Search response:  ~50KB (20 results with thumbnails)
Peak search BW:   36K × 50KB = 1.8 GB/s
PDP response:     ~200KB (images served separately via CDN)
Peak PDP BW:      70K × 200KB = 14 GB/s
Image CDN BW:     ~50 GB/s (offloaded to CDN)

#Cache Estimates

Top 20% products = 100M products (Pareto principle)
Cache size:       100M × 10KB = 1TB (distributed Redis cluster)
Cache hit ratio:  ~95% for popular products

#3. API Design

#Search API

GET /api/v1/products/search
  ?q=wireless+headphones
  &category=electronics
  &price_min=20&price_max=200
  &brand=sony,bose
  &rating_min=4
  &sort=relevance|price_asc|price_desc|rating|newest
  &page=1&page_size=20

Response 200:
{
  "results": [
    {
      "product_id": "P123",
      "title": "Sony WH-1000XM5",
      "price": { "amount": 349.99, "currency": "USD", "was": 399.99 },
      "rating": { "avg": 4.7, "count": 12453 },
      "thumbnail_url": "https://cdn.example.com/...",
      "badges": ["Best Seller", "Prime"],
      "in_stock": true
    }
  ],
  "facets": {
    "brands": [{"name":"Sony","count":234}, ...],
    "price_ranges": [{"range":"$0-$50","count":89}, ...]
  },
  "total": 1847,
  "page": 1
}

#Product Detail API

GET /api/v1/products/{product_id}

Response 200:
{
  "product_id": "P123",
  "title": "Sony WH-1000XM5",
  "description": "...",
  "category_path": ["Electronics", "Audio", "Headphones"],
  "variants": [
    { "variant_id": "V1", "color": "Black", "price": 349.99,
      "inventory": { "status": "IN_STOCK", "quantity_hint": "50+" },
      "images": ["url1", "url2"] }
  ],
  "seller": { "id": "S456", "name": "Sony Official", "rating": 4.9 },
  "reviews_summary": { "avg": 4.7, "count": 12453, "distribution": {5:8200,4:2800,3:900,2:350,1:203} },
  "related_products": ["P789", "P012"]
}

#Product Write API (Seller-facing)

POST /api/v1/seller/products
PUT  /api/v1/seller/products/{product_id}

Body:
{
  "title": "...",
  "description": "...",
  "category_id": "CAT_123",
  "variants": [...],
  "images": [<multipart uploads>]
}

#Inventory Update API (Internal)

PATCH /api/v1/inventory/{variant_id}
{
  "warehouse_id": "WH_01",
  "quantity_delta": -1,
  "operation": "RESERVE|RELEASE|DEDUCT"
}

#4. Data Model

#Product Catalog (PostgreSQL — sharded by product_id)

-- Core product (immutable-ish, versioned)
CREATE TABLE products (
  product_id    BIGINT PRIMARY KEY,   -- Snowflake ID
  seller_id     BIGINT NOT NULL,
  title         VARCHAR(500),
  description   TEXT,
  category_id   INT,
  brand         VARCHAR(200),
  status        ENUM('DRAFT','ACTIVE','INACTIVE','DELETED'),
  created_at    TIMESTAMP,
  updated_at    TIMESTAMP,
  version       INT DEFAULT 1
);

-- Variants (size/color combos)
CREATE TABLE product_variants (
  variant_id    BIGINT PRIMARY KEY,
  product_id    BIGINT REFERENCES products,
  sku           VARCHAR(50) UNIQUE,
  attributes    JSONB,          -- {"color":"Black","size":"M"}
  price_cents   BIGINT,
  compare_price BIGINT,         -- strikethrough price
  weight_grams  INT,
  status        ENUM('ACTIVE','INACTIVE')
);

-- Images
CREATE TABLE product_images (
  image_id      BIGINT PRIMARY KEY,
  product_id    BIGINT,
  variant_id    BIGINT NULL,    -- NULL = applies to all variants
  url           VARCHAR(500),
  position      SMALLINT,       -- display order
  alt_text      VARCHAR(300)
);

#Inventory (DynamoDB / Cassandra — high write throughput)

Partition Key: variant_id
Sort Key:      warehouse_id

{
  variant_id:     "V1",
  warehouse_id:   "WH_01",
  available:      142,
  reserved:       18,
  version:        347          // optimistic concurrency
}

#Search Index (Elasticsearch)

{
  "product_id": "P123",
  "title": "Sony WH-1000XM5 Wireless Headphones",
  "title_ngram": "...",
  "description": "...",
  "category_path": ["Electronics", "Audio", "Headphones"],
  "brand": "Sony",
  "price": 349.99,
  "rating_avg": 4.7,
  "rating_count": 12453,
  "in_stock": true,
  "badges": ["Best Seller"],
  "seller_id": "S456",
  "attributes": { "color": ["Black","Silver"], "connectivity": "Bluetooth" },
  "created_at": "2025-01-15T00:00:00Z"
}

#Access Patterns

Query	Store	Index/Key
Search by keyword + filters	Elasticsearch	Full-text + filters
Get product by ID	PostgreSQL + Redis cache	PK: product_id
Get variants for product	PostgreSQL	IX: product_id
Check inventory	DynamoDB	PK: variant_id, SK: warehouse_id
Products by seller	PostgreSQL	IX: seller_id
Products by category	Elasticsearch	Term filter

#5. High-Level Design (HLD) & Scale Evolution

A system like this is rarely built at "Amazon-scale" from day one. Here is how the architecture evolves as traffic and data grow.

#Stage 1: MVP / Startup Scale (10K DAU, 1M Products)

At this scale, simplicity and speed of iteration are prioritized over massive horizontal scalability.

Architecture:

Single Monolith App: Handles search, catalog, and inventory.
Relational DB (PostgreSQL): Stores products, inventory, and handles full-text search (using pg_trgm or built-in text search).
No Cache / Message Queue: Direct DB queries for all operations.

Bottlenecks that force evolution:

Search queries become too slow and inaccurate as the catalog grows.
The single database becomes a single point of failure and read/write bottleneck.
Product Details Page (PDP) load times increase due to complex DB joins.

#Stage 2: Growth Scale (1M DAU, 50M Products)

We introduce specialized data stores and caching to handle read-heavy traffic and improve search relevance.

Key Architectural Changes:

Introduce Elasticsearch: Offload search from PostgreSQL to a dedicated search engine.
Introduce Redis: Cache PDP responses, session data, and popular search queries.
Microservices Split: Separate Search Service and Catalog/Inventory Service to scale independently.
CDN: Serve all static images via CloudFront instead of application servers.

Bottlenecks that force evolution:

Write contention on inventory during flash sales (DB locks degrade performance).
Rebuilding the search index synchronously impacts catalog write latency.
Single PostgreSQL database can no longer hold all product data efficiently without sharding.

#Stage 3: Amazon-Scale (100M DAU, 500M Products)

This is the target architecture defined in the requirements. We move to a fully decoupled, event-driven architecture with CQRS.

#Architecture Diagram (Amazon-Scale)

                        ┌─────────────┐
                        │   [CDN](../building-blocks/cdn.md)       │ ← Images, static assets
                        │ (CloudFront)│
                        └──────┬──────┘
                               │
                    ┌──────────▼──────────┐
                    │    API Gateway       │
                    │  (Rate limit, Auth,  │
                    │   Request routing)   │
                    └──────────┬──────────┘
                               │
            ┌──────────────────┼──────────────────┐
            ▼                  ▼                  ▼
   ┌─────────────────┐ ┌────────────────┐ ┌───────────────┐
   │  Search Service  │ │ Catalog Service│ │ Inventory Svc │
   │                  │ │                │ │               │
   │ • Query parsing  │ │ • CRUD products│ │ • Stock check │
   │ • Spell correct  │ │ • Variant mgmt │ │ • Reserve     │
   │ • Faceting       │ │ • Image upload │ │ • Deduct      │
   │ • Ranking        │ │ • Validation   │ │               │
   └────────┬─────────┘ └───────┬────────┘ └───────┬───────┘
            │                   │                   │
            ▼                   ▼                   ▼
   ┌─────────────────┐ ┌────────────────┐ ┌───────────────┐
   │  Elasticsearch   │ │  PostgreSQL    │ │  DynamoDB     │
   │  Cluster         │ │  (Sharded)     │ │               │
   └─────────────────┘ └───────┬────────┘ └───────────────┘
                               │
                     ┌─────────▼─────────┐
                     │    Redis Cache     │
                     │  (Product + Price) │
                     └───────────────────┘

   ┌─────────────────────────────────────────────────────────┐
   │                    Event Bus (Kafka)                     │
   │  Topics: product.updated | inventory.changed |           │
   │          price.changed   | index.rebuild                 │
   └─────────────────────────────────────────────────────────┘
                               │
                    ┌──────────▼──────────┐
                    │  Index Builder Svc   │
                    │  (Kafka Consumer)    │
                    │  Denormalizes data → │
                    │  Writes to ES        │
                    └─────────────────────┘

#Component Breakdown

Component	Responsibility	Tech Choice
API Gateway	Load balancing, rate limiting, auth	Kong / AWS API GW
Search Service	Query parsing, spell-check, ranking	Custom + Elasticsearch
Catalog Service	Product CRUD, validation	Java/Go microservice
Inventory Service	Stock management with optimistic locking	Go microservice
Pricing Service	Dynamic pricing, deals, coupons	Separate service
Index Builder	Denormalize catalog+inventory+pricing → ES	Kafka consumer
Image Service	Upload, resize, optimize, CDN invalidation	S3 + Lambda

#Data Flow

Write Path (Seller lists a product):

Seller → API GW → Catalog Service → PostgreSQL (write)
                                   → Kafka (product.created event)
                                   → Index Builder consumes event
                                   → Joins with inventory + pricing data
                                   → Writes to Elasticsearch
                                   → Invalidates Redis cache

Read Path (User searches):

User → API GW → Search Service → Elasticsearch (query)
                                → Hydrate with Redis cache (prices, stock)
                                → Return ranked results

Read Path (User views PDP):

User → API GW → Catalog Service → Redis cache (hit? return)
                                 → Cache miss → PostgreSQL
                                 → Inventory Service → DynamoDB (stock)
                                 → Assemble response → Cache in Redis

#6. Deep Dive — Core Components

#Search Service — Detailed Design

Query Pipeline:

Raw Query → Tokenize → Spell Correct → Synonym Expand
         → Category Predict → Build ES Query → Execute
         → Re-rank (ML) → Hydrate prices/stock → Return

Elasticsearch Query Strategy:

bool query with must (keyword match) + filter (category, price, brand)
function_score to boost by: relevance, sales velocity, rating, recency
Aggregations for facet counts (brand, price ranges, ratings)

Relevance Ranking Factors:

Score = TextRelevance × 0.3
      + SalesVelocity × 0.25
      + Rating × 0.2
      + SellerQuality × 0.1
      + Recency × 0.05
      + StockAvailability × 0.1

Spell Correction: Use ES suggest API with phrase suggester + custom dictionary from product catalog.

#Catalog Service — Detailed Design

Product Versioning: Every update creates a new version. This enables:

Rollback on bad seller edits
Audit trail for pricing disputes
Safe async index rebuilding from any version

Image Pipeline:

Upload → Virus Scan → Store Original in S3
       → Async: Generate thumbnails (150px, 300px, 600px, 1200px)
       → Store variants in S3
       → Update CDN mappings
       → Return CDN URLs

#Inventory Service — Detailed Design

Why DynamoDB? Single-digit ms latency, auto-scaling, built-in optimistic concurrency via conditional writes.

Reservation Pattern (for cart/checkout):

1. User adds to cart → RESERVE (available--, reserved++)
2. Checkout success  → DEDUCT  (reserved--)
3. Cart timeout 15m  → RELEASE (available++, reserved--)

Conditional Write (prevents overselling):

UpdateItem:
  SET available = available - 1, reserved = reserved + 1, version = version + 1
  CONDITION: available > 0 AND version = :expected_version

#Scaling Strategy

Component	Strategy
PostgreSQL	Shard by product_id (hash-based, 256 shards). Read replicas per shard for read-heavy PDP traffic
Elasticsearch	50+ shards across nodes; time-based indices for new products, merged monthly
Redis	Cluster mode, 1TB+ across 50+ nodes; LRU eviction; separate cluster for sessions vs product cache
DynamoDB	Auto-scaling with on-demand mode for flash sales
Search Service	Horizontal scale; stateless pods behind LB
Index Builder	Scale Kafka consumer group partitions; batch writes to ES

#Caching Strategy

┌──────────────────────────────────────────────────────────┐
│ Layer 1: CDN (CloudFront)                                │
│ • Images, category pages (TTL: 24h)                      │
│ • Search results for popular queries (TTL: 5m)           │
├──────────────────────────────────────────────────────────┤
│ Layer 2: Application Cache (Redis)                       │
│ • Product metadata (TTL: 10m, invalidate on update)      │
│ • Price snapshots (TTL: 1m — prices change frequently)   │
│ • Search facet counts (TTL: 5m)                          │
├──────────────────────────────────────────────────────────┤
│ Layer 3: Local Cache (in-process)                        │
│ • Category tree (TTL: 1h — rarely changes)               │
│ • Feature flags, config (TTL: 30s)                       │
└──────────────────────────────────────────────────────────┘

Cache Invalidation: Event-driven via Kafka. When product/price/inventory changes → publish event → cache invalidation consumer deletes Redis keys.

#Consistency Models

Data	Model	Rationale
Product metadata	Eventually consistent (< 30s)	Stale title/description is acceptable briefly
Price	Eventually consistent (< 5s)	Wrong price = legal/trust issue; keep lag minimal
Inventory	Strongly consistent	Overselling = terrible UX; use conditional writes
Search index	Eventually consistent (< 30s)	Slight delay in searchability is acceptable

#7. Low-Level Design — Core Functionality

#Key Algorithms

1. Search Query Parser

Input:  "sony headphones under $200 with 4+ stars"
Output: {
  keywords: ["sony", "headphones"],
  filters: { price_max: 200, rating_min: 4, brand: "Sony" },
  intent: "product_search"
}

Uses NLP-based intent detection to extract structured filters from natural language.

2. Inventory Reservation (Optimistic Locking)

def reserve_stock(variant_id, warehouse_id, quantity):
    item = dynamodb.get_item(variant_id, warehouse_id)
    if item.available < quantity:
        raise OutOfStockError()

    try:
        dynamodb.update_item(
            key=(variant_id, warehouse_id),
            update="SET available = available - :qty, reserved = reserved + :qty, version = version + 1",
            condition="version = :current_version AND available >= :qty",
            values={":qty": quantity, ":current_version": item.version}
        )
    except ConditionalCheckFailed:
        retry_with_backoff()  # Another request modified stock

3. Index Builder (CQRS Pattern)

Kafka Event (product.updated) →
  1. Fetch latest product from PostgreSQL
  2. Fetch current price from Pricing Service
  3. Fetch inventory status from DynamoDB
  4. Denormalize into search document
  5. Upsert into Elasticsearch
  6. Invalidate Redis cache for product_id

#Design Patterns Used

Pattern	Where	Why
CQRS	Catalog (write) vs Search (read)	Optimize read and write paths independently
Event Sourcing	Product updates via Kafka	Decoupled index building, audit trail
Saga	Inventory reservation across checkout	No distributed transactions needed
Circuit Breaker	Search → ES, Catalog → DB	Prevent cascade failures
Bulkhead	Separate thread pools per downstream	Isolate failures
Cache-Aside	PDP reads	Most common; simple invalidation

#8. Failure Handling & Edge Cases

#What Happens When X Fails?

Failure	Impact	Mitigation
Elasticsearch down	Search broken	Fallback to category browsing from PostgreSQL; pre-cached popular searches
PostgreSQL shard down	PDP for subset of products unavailable	Read replicas auto-failover; cache serves stale data
Redis down	Latency spike	Direct DB reads; local cache as L1 fallback
Kafka lag	Search index stale	Alert on lag > 60s; fallback to direct DB polling
DynamoDB throttle	Inventory checks slow	On-demand auto-scaling; queue overflow to SQS
CDN origin failure	Images don't load	Multi-origin failover; placeholder images

#Flash Sale / Traffic Spike Handling

1. Pre-warm caches for sale products (30 min before)
2. Pre-scale search + catalog service pods
3. Rate limit non-essential APIs (reviews, recommendations)
4. Inventory: switch to queue-based reservation to absorb burst
5. Show "Almost Gone!" when stock < threshold (creates urgency + reduces oversell)

#Edge Cases

Seller uploads 10,000 products at once → Rate limit to 100/min per seller; async bulk import via SQS
Price shows $0 due to bug → Price validation layer: reject prices outside ±50% of historical range
Same product listed by 100 sellers → Canonical product matching; show "Buy Box" winner
Search returns 0 results → Relax filters progressively; suggest corrections; show related categories

#9. Monitoring & Observability

#Key Metrics

Metric	Target	Alert Threshold
Search P99 latency	< 200ms	> 500ms
PDP P99 latency	< 100ms	> 300ms
Search zero-result rate	< 5%	> 10%
Cache hit ratio	> 95%	< 85%
Inventory accuracy	> 99.9%	< 99%
Index lag (Kafka → ES)	< 30s	> 120s
Error rate (5xx)	< 0.1%	> 0.5%

#Alerting Strategy

P0 (Page immediately): Search service down, inventory overselling, payment-price mismatch
P1 (15 min response): Cache hit ratio drop, ES index lag > 2 min, error rate spike
P2 (Next business day): Storage approaching limits, slow query patterns, stale product count increasing

#SLAs / SLOs

Search API:    99.95% availability, P99 < 200ms
PDP API:       99.99% availability, P99 < 100ms
Inventory API: 99.99% availability, P99 < 50ms
Index freshness: 99% of updates reflected in < 30s

#10. Trade-off Summary

Decision	Chose	Over	Because
Search engine	Elasticsearch	Solr, PostgreSQL FTS	Better scaling, richer aggregations, proven at scale
Catalog DB	PostgreSQL (sharded)	MongoDB	Relational integrity for products/variants; JSONB for flexible attributes
Inventory DB	DynamoDB	PostgreSQL, Redis	Single-digit ms latency + conditional writes; auto-scaling for flash sales
Read/Write separation	CQRS via Kafka	Single DB for both	Read (search) and write (catalog) have vastly different patterns and scale
Cache invalidation	Event-driven (Kafka)	TTL-only	Faster propagation for price/stock changes; TTL as safety net
Image storage	S3 + CDN	Self-hosted	Cost-effective at PB scale; global CDN for low latency
Consistency for inventory	Strong (conditional writes)	Eventual	Overselling is worse than occasional "out of stock" error
Consistency for search	Eventual (< 30s lag)	Strong	Users tolerate slight delay in searchability

#11. Extensions & Follow-ups

#What Would You Add With More Time?

Personalized Search Ranking — ML model using user purchase history, browsing behavior
Visual Search — Upload image → find similar products (CNN embeddings + ANN search)
Real-time Price Comparison — Aggregate prices across sellers with live bidding for "Buy Box"
Multi-region Active-Active — CRDTs for catalog data; regional ES clusters with cross-replication
A/B Testing Framework — Test ranking algorithms, UI layouts, pricing strategies
Fraud Detection — Flag fake reviews, counterfeit products, price manipulation

#How Would This Change at 100x Scale?

50B products: Tiered storage — hot products in SSD-backed ES, cold in S3-backed archive with lazy indexing
100B searches/day: Precomputed search results for top 10K queries; edge compute for personalization
Inventory: Move to event-sourced log (Kafka Streams) with materialized views per warehouse region
Search: Custom search engine (like Amazon's A9) replacing Elasticsearch for full control over ranking

#12. Cross-References

Topic	Connection
Rate Limiter (#2)	API Gateway rate limiting for sellers and search abuse
Unique ID Generator (#4)	Snowflake IDs for product_id, variant_id
Notification System (#13)	Price drop alerts, back-in-stock notifications
Search Autocomplete (#14)	Typeahead for product search bar
Distributed Cache (#15)	Redis cluster design for product cache
Payment System (#18)	Checkout integration with price verification

#Building Blocks Used

Load Balancer — L7 for API routing
CDN — Image delivery, static search pages
PostgreSQL — Product catalog (sharded)
Elasticsearch — Search index with faceting
DynamoDB — Inventory with conditional writes
Redis — Multi-layer caching
Kafka — Event bus for CQRS, cache invalidation
S3 — Image/media object storage