Glue (Serverless , ETL)
Intro
- AWS Glue is a fully managed extract, transform, and load (ETL) service
- makes easy for customers, to prepare and load their data for analytics.
- AWS Glue job is meant to be used for batch ETL data processing.
Glue Components
- Glue Data catalog : metadata
- Glue Data Crawler : scan source and create help to create metadata.
- Glue Elastic Views: virtual table.
- Glue DataBrew: clean and normalize data, using pre-built transformation
- Glue Job Bookmarks : prevent re-processing old data

Use case
-

-
- very common
-

-
3. prepare data for analysis and load/store into S3 as target.
more
- Glue Studio: new GUI to create, run and monitor ETL jobs in Glue
- Glue
Streaming
ETL :
- built on Apache Spark Streaming
- compatible with
- Kinesis Data Streaming
- Kafka