- https://chatgpt.com/c/675f4a6d-26b8-800d-b73e-a01c28a386b2¶
DynamoDB (serverless)¶
A. Intro¶
- No Sql
- unstructured
- rapidly evolve schemas
- All the data that is needed for a query is present in one row
- don’t perform aggregations such as “SUM”, “AVG”, …
-
scale horizontally
-
Fully managed
- no maintenance/patching
- no db admin
-
no provisioning db
-
high consistent performance DISTRIBUTED database
single-digit millisecond performance
, at any scaleMillions of requests
per secondstrillions of row
100s of TB
of storage- Scales to
massive workloads
-
It provides both provisioned (specify RCU & WCU) and on-demand (pay for what you use) capacity modes.
-
Allows event driven programming with DynamoDB Streams
B. Distributed DB (highly available)¶
Traditional relational DB:¶
- Vertical scaling (getting a more powerful CPU / RAM / IO)
limited
Horizontal scaling- increasing Read Replicas.
- but limited. eg max 16 read replica/s.
1. DynamoDB: Single-region
Table¶
- table data stored in multiple partitions.
- hashing algorithm( on PartitionKey) ==> decides which partition to go.
- partitionKey
unique
diverse
, to distribute data evenly on partition.- add suffix to partition key to make it more n more diverse.
- this is called write sharding.
- so use random. calculated suffix.
Partition¶
- dynamoDB
-
table-1
- partitions-1 (node-1 in different az/s) <--> 2-way replication b/w AZ/s <--> partitions-1(node-2)
- both can Read and Write
- no leader/primary concept.
- partitions-2
- ...
- ...
- scale out more partition/s
-
Number of partitions
2. DynamoDB: Global
Table¶
- table-1 (R/W) is
region-1
- table-1 (R/W) is
region-2
- 2-way replication b/w region/s.
- Enable DynamoDB Streams
- helps to replicate data across replica tables in other AWS Regions
C. DynamoDB streams¶
- udemy reference
- enable it explicitly and select one of below kind:
• KEYS_ONLY – only the key attributes of the modified item • NEW_IMAGE – the entire item, as it appears after it was modified • OLD_IMAGE – the entire item, as it appeared before it was modified • NEW_AND_OLD_IMAGES – both the new and the old images of the item
- internally has same concept of shards, same like in KDS
- we don’t provision shards, this is automated by AWS
-
Ordered stream of item-level modifications (create/update/delete) in a table.
-
has 24 hrs retention, so process it or send it somewhere in that window , if needed.
- read and process by directly
Lambda fn
- Define event source mapping 03_lambda-dva-02-trigger.md
-
send to
KDS
using kinesis-connectors- custom app ( aws-sdk or KCL) - read and process
- KDA
- KDF >> redshift / s3 / opensearch
- lambda fn
-
stream
Use cases
- react to changes in real-time (welcome email to users)
- Analytics
- Insert into derivative tables
- Insert into OpenSearch Service
-
cross-region replication
- if not using global dynamoDB
-
common architecture
D. Storage¶
- max item size :
400 KB
- storage classes ( like in s3 ):
- Standard
- Infrequent Access (IA)
- to save more cost
- storing large object in dynamoDB.
- use s3 to storage
- in table store s3 metadata.
E. Security¶
- encryption at rest/fly by KMS/TLS
- Resource-based policy for table.
- keep traffic inside vpc
- create vpce / gateway (privateLink)
- Integrated with IAM
- use Identity provider
- federated user
- Assign an IAM Role to federated user with a Condition:
- to limit their API access to DynamoDB
- dynamodb:
LeadingKeys
== partition_key or partition_key_prefix - row level access.
- user can access row with that partition_key
- dynamodb:
Attributes
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "dynamodb:GetItem", "dynamodb:Query" ], "Resource": "arn:aws:dynamodb:us-west-2:123456789012:table/YourTableName", "Condition": { "ForAllValues:StringEquals": { "dynamodb:Attributes": ["name", "email"] } } }, { "Effect": "Allow", "Action": "dynamodb:UpdateItem", "Resource": "arn:aws:dynamodb:us-west-2:123456789012:table/YourTableName", "Condition": { "ForAllValues:StringEquals": { "dynamodb:Attributes": ["status"] } } }, { "Effect": "Allow", "Action": [ "dynamodb:GetItem", "dynamodb:Query", "dynamodb:PutItem", "dynamodb:UpdateItem", "dynamodb:DeleteItem" ], "Resource": "arn:aws:dynamodb:us-west-2:123456789012:table/YourTableName", "Condition": { "ForAllValues:StringEquals": { "dynamodb:LeadingKeys": "${aws:userid}" //"dynamodb:LeadingKeys": "${aws:departid}_*" } } } ] }
F. DR¶
- Automatic backup for last
35 days
. - take ondemand backup for longer retention.
- Enable Cross-region copy. no performace impact/ downtime.
- PITR - point in time recovery
export
(json,ion) data --> S3.import
(json,csv,ion) --> Dynamo DB
G. Copying DynamoDB Table (into s3)¶
1 AWS Data pipeline (fast)¶
- launches EMR bts.
"EMR" is a cloud-based big data platform that simplifies the process of running large-scale distributed data processing frameworks like - Apache Hadoop - Apache Spark - ...
2 backup and restore¶
- takes time
3 API call¶
- scan + putItem
- BatchWriteItem
- not recommended, but can do for small table.
H. Migration¶
- use AWS DMS to dynamoDB from:
- MongoDB
- SQL database : MySQL,Oracle, etc
- de-normalize
- convert to item
I. Architecture example:¶
- use dynamoDB for indexing S3 metadata.