Skip to content

B. Kinesis Data Firehose KDF

1 Intro

  • easiest way to load streaming data into data stores and analytics tools.
  • capture,
  • transform,
  • load streaming data
  • also : batch, compress, and encrypt πŸ‘ˆ
  • NearRealTime Data Delivery streams
  • set buffer-interval 0-900 Sec
    • if buffer-interval == 0 --> real time
    • if buffer-interval == 1 to 900 sec --> Near real time
  • set buffer-size

    • min : 1 MB
    • default : 5 min
    • KDF only buffers data, does not have any its own permanent storage.
    • no replay capbilty,
  • serverless

  • fully managed,
  • no administration πŸ‘ˆ
    • unlike KDS where we provision no. of shards.
  • auto scale

2 Source and Destinations

  • source: KDS, KCL/SDK, K-agent, AWS IoT 🎯
  • destination ( only 3 in aws side): s3, redshift/OLAP DB, openSearch
  • CANNOT set up multiple consumers gor KDF-streams, as it can dump data in a single data repository at a time πŸ‘ˆ
  • fact to remember πŸ‘ˆ 🎯
  • When KDS is configured as the source of a KDF stream, then:

    • Firehose’s PutRecord and PutRecordBatch operations are disabled
    • thus, Kinesis-Agent cannot write to KDF Stream directly.
  • img_3.png

  • optional lambda transformation + convert format to parquet+ORC
  • can put failed item into s3
  • write data in batches

extra

img.png