kickoff
-
- https://chat.deepseek.com/a/chat/s/8483329c-7494-483f-bd9c-f296c81b084e

- Sharding vs Partitioning in RDBMS
- Partitioning splits a table into smaller segments within the same database server (by rows or columns) for performance and manageability.
- Sharding distributes data across multiple servers (horizontal scaling) to handle large-scale workloads.
A. RDBMS partitioning
- https://www.youtube.com/watch?v=oJj-pltxBUM&ab_channel=High-PerformanceProgramming - intro
- https://www.youtube.com/watch?v=VcTPmEJeKM4&ab_channel=AWSEvents - aws rds sharding
=== TOPICS asked in above deepseek chat =====
0. sharding vs partitioning in RDBMS.
1. Database - Sharding and partitioning in RDBMS
2. Database - Sharding and partitioning in noSQL
3. No SQL DB - ACID
4. RDBMS - ACID
5. Blue-Green Partitioning - implementation approach rather than a PostgreSQL feature
- Blue (production)
- Green (with new partitioning scheme)
Enables zero-downtime migration to partitioned tables. like in k8s
6 scenario1:
table-1 create without partition. after one year millions of records inserted. can we add partition later.
7 scenario2:
table-1 create with range partition. after one year millions of records inserted.
can we add update partiton to hash type from range type.
8 time-based partitioning if historical data grows rapidly. intQ
Example
- hash partition
CREATE TABLE users (
user_id INT,
username VARCHAR(50)
) PARTITION BY HASH (user_id);
-- Partition 0: Stores rows where `hash(user_id) % 4 == 0`
CREATE TABLE users_p1 PARTITION OF users
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
-- Partition 3: Stores rows where `hash(user_id) % 4 == 3`
CREATE TABLE users_p2 PARTITION OF users
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
- Time-Based Partitioning
-- Parent table (logical)
CREATE TABLE sales (
id SERIAL,
sale_date DATE,
customer_id INT,
amount DECIMAL(10,2)
) PARTITION BY RANGE (sale_date);
-- Yearly partitions
CREATE TABLE sales_2023 PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE sales_2024 PARTITION OF sales
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
... manually create more in future...
-- ===== Automatic Partition Creation =====
-- PostgreSQL example: Function to create next month's partition
CREATE OR REPLACE FUNCTION create_next_month_partition()
RETURNS TRIGGER AS $$
BEGIN
EXECUTE format(
'CREATE TABLE IF NOT EXISTS sales_%s PARTITION OF sales '
'FOR VALUES FROM (%L) TO (%L)',
to_char(NEW.sale_date, 'YYYY_MM'),
date_trunc('month', NEW.sale_date),
date_trunc('month', NEW.sale_date) + INTERVAL '1 month'
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Trigger to run before INSERT
CREATE TRIGGER trg_sales_partition
BEFORE INSERT ON sales
FOR EACH ROW EXECUTE FUNCTION create_next_month_partition();
-- option-2: pg_cron
B. RDBMS Sharding
intro
- https://www.youtube.com/watch?v=be6PLMKKSto&ab_channel=Exponent
- Sharding splits a large database into smaller, independent chunks ("shards") distributed across multiple servers
- Horizontal Scaling
- global users table is split by region
- users table
- users_europe (PostgreSQL Server 1)
- users_europe (PostgreSQL Server 2)
- advantages
- β
Improved Performance (queries run on smaller datasets).
- β
Fault Isolation (one shard failing doesnβt crash the whole DB).
- postgres Doesnot provide automatic sharding
- do manually
- Citus (PostgreSQL Extension)
- useful for : Multi-tenant SaaS apps (isolate customer data).
Sharding Strategies
- first create/deploy manually db server/s. shard-1,Shard 2,...
- Application code:
- Key-Based (Hash) Sharding
- -- Shard 1: user_id % 4 = 0
- -- Shard 2: user_id % 4 = 1
- Range-Based Sharding
- -- Shard 1: order_id 1-1000
- -- Shard 2: order_id 1001-2000
- Directory-Based Sharding
- -- Lookup table:

- SELECT shard_location FROM shard_map WHERE user_id = 123;
Challenges of Sharding
- β Complexity (joins across shards are hard).
- β No ACID across shards (distributed transactions are slow).
- β Rebalancing (moving data between shards is tricky).