Redshift (Not serverless)¶

1. Intro¶

Cluster => leader Node => compute Node => 1000s of spectrum
performance (10x), than other data warehouses and athena
leader node receives queries from client applications, parses the queries, and develops query execution plans
coordinates the parallel execution of these plans with the compute nodes
Aggregates the intermediate results
parallel query engine
run complex SQL
faster-joins
faster-aggregation
uses indexes + Columnar storage

Analysts can use Redshift SQL queries to access both :
Redshift tables (recent data) / hot data
data in S3 (historical data) without moving data into redshift
- Spectrum, allows you to run queries on data stored in Amazon S3,
- without having to move that data into your Redshift cluster.

single-AZ (by default)
Multi-AZ replication
cross region-replication - explicitly enable
incremental-snapshot(only new change), in every 8 hr.
- retention: 35 days.
- stored in s3.
restore snapshot/s into new region : manually/automate.

App
less than year older data --> redshift --> analytic-report-1
older than year --> s3
analytic-report-2, reference from --> s3 + redshift
how to cross-reference s3
Amazon Redshift AQUA (Advanced Query Accelerator)
distributed query acceleration layer designed to speed up certain types of queries in Amazon Redshift, particularly complex analytical queries.
boost bt 10x
resolves network bandwidth + cpu processing bottleneck
Datashare feature
Cross-Account Data Sharing for Amazon Redshift
https://aws.amazon.com/blogs/aws/cross-account-data-sharing-for-amazon-redshift/