Redshift (Not serverless)¶
1. Intro¶
- OLAP (online analytic processing) database
- data warehousing
- Redshift's internal storage
- does NOT have "tiers" of storage classes like Amazon S3
- load PB of data and perform query/analysis
- like athena, Integrated with aws:glue (crawler > catalog)
- analytic query result goes to:
- S3
- Amazon QuickSight (dashboard)
Tableau
- ...
2. Data Source¶
- load data from below sources
- KDF
- s3
- program(Java:SDK): in-batches
- AWS glue/ETL
- ...
3. Cluster architecture¶
- Cluster => leader Node => compute Node => 1000s of spectrum
- performance
(10x)
, than other data warehouses and athena - leader node receives queries from client applications, parses the queries, and develops query execution plans
- coordinates the parallel execution of these plans with the compute nodes
- Aggregates the intermediate results
- parallel query engine
- run complex SQL
- faster-joins
- faster-aggregation
- uses indexes + Columnar storage
4. cross-referencing
¶
- Analysts can use Redshift SQL queries to access both :
- Redshift tables (
recent
data) / hot data - data in S3 (
historical
data) without moving data into redshift- Spectrum, allows you to run queries on data stored in Amazon S3,
- without having to move that data into your Redshift cluster.
5. DR¶
- single-AZ (by default)
- Multi-AZ replication
- cross region-replication - explicitly enable
- incremental-snapshot(only new change), in every 8 hr.
- retention: 35 days.
- stored in s3.
- restore snapshot/s into new region : manually/automate.
Exam
¶
- App
- less than year older data --> redshift -->
analytic-report-1
- older than year --> s3
analytic-report-2
, reference from --> s3 + redshift- how to cross-reference s3
- Amazon Redshift AQUA (Advanced Query Accelerator)
- distributed query acceleration layer designed to speed up certain types of queries in Amazon Redshift, particularly complex analytical queries.
- boost bt 10x
- resolves network bandwidth + cpu processing bottleneck
- Datashare feature
- Cross-Account Data Sharing for Amazon Redshift
- https://aws.amazon.com/blogs/aws/cross-account-data-sharing-for-amazon-redshift/