Amazon Kinesis Data Firehose: Developer Guide
Amazon Kinesis Data Firehose: Developer Guide
Amazon Kinesis Data Firehose: Developer Guide
Developer Guide
Amazon Kinesis Data Firehose Developer Guide
Amazon's trademarks and trade dress may not be used in connection with any product or service that is not
Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or
discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may
or may not be affiliated with, connected to, or sponsored by Amazon.
Amazon Kinesis Data Firehose Developer Guide
Table of Contents
What Is Amazon Kinesis Data Firehose? ................................................................................................ 1
Key Concepts ............................................................................................................................. 1
Data Flow ................................................................................................................................. 1
Setting Up ........................................................................................................................................ 4
Sign Up for AWS ........................................................................................................................ 4
Optional: Download Libraries and Tools ........................................................................................ 4
Creating a Kinesis Data Firehose Delivery Stream ................................................................................... 5
Name and source ....................................................................................................................... 5
Process records .......................................................................................................................... 6
Select destination ...................................................................................................................... 6
Choose Amazon S3 for Your Destination ............................................................................... 6
Choose Amazon Redshift for Your Destination ....................................................................... 7
Choose Amazon ES for Your Destination ............................................................................... 8
Choose Splunk for Your Destination ..................................................................................... 9
Configure settings .................................................................................................................... 11
Testing Your Delivery Stream ............................................................................................................. 12
Prerequisites ............................................................................................................................ 12
Test Using Amazon S3 as the Destination .................................................................................... 12
Test Using Amazon Redshift as the Destination ............................................................................ 12
Test Using Amazon ES as the Destination .................................................................................... 13
Test Using Splunk as the Destination .......................................................................................... 13
Sending Data to a Kinesis Data Firehose Delivery Stream ...................................................................... 15
Writing Using Kinesis Data Streams ............................................................................................ 15
Writing Using the Agent ............................................................................................................ 16
Prerequisites .................................................................................................................... 17
Download and Install the Agent ......................................................................................... 17
Configure and Start the Agent ........................................................................................... 18
Agent Configuration Settings ............................................................................................. 18
Monitor Multiple File Directories and Write to Multiple Streams .............................................. 20
Use the Agent to Preprocess Data ...................................................................................... 21
Agent CLI Commands ....................................................................................................... 24
Writing Using the AWS SDK ....................................................................................................... 25
Single Write Operations Using PutRecord ............................................................................ 25
Batch Write Operations Using PutRecordBatch ..................................................................... 25
Writing Using CloudWatch Logs ................................................................................................. 26
Writing Using CloudWatch Events ............................................................................................... 26
Writing Using AWS IoT .............................................................................................................. 26
Security ........................................................................................................................................... 28
Data Protection ........................................................................................................................ 28
Server-Side Encryption with Kinesis Data Streams as the Data Source ...................................... 28
Server-Side Encryption with Direct PUT or Other Data Sources ............................................... 29
Controlling Access .................................................................................................................... 29
Grant Your Application Access to Your Kinesis Data Firehose Resources .................................... 30
Allow Kinesis Data Firehose to Assume an IAM Role .............................................................. 30
Grant Kinesis Data Firehose Access to an Amazon S3 Destination ............................................ 31
Grant Kinesis Data Firehose Access to an Amazon Redshift Destination .................................... 32
Grant Kinesis Data Firehose Access to an Amazon ES Destination ............................................ 34
Grant Kinesis Data Firehose Access to a Splunk Destination .................................................... 36
Access to Splunk in VPC .................................................................................................... 38
Cross-Account Delivery to an Amazon S3 Destination ............................................................ 38
Cross-Account Delivery to an Amazon ES Destination ............................................................ 39
Using Tags to Control Access ............................................................................................. 40
Monitoring ............................................................................................................................... 41
Compliance Validation .............................................................................................................. 42
iii
Amazon Kinesis Data Firehose Developer Guide
Resilience ................................................................................................................................ 42
Disaster Recovery ............................................................................................................. 42
Infrastructure Security .............................................................................................................. 43
VPC Endpoints (PrivateLink) ............................................................................................... 43
Security Best Practices .............................................................................................................. 43
Implement least privilege access ........................................................................................ 43
Use IAM roles .................................................................................................................. 43
Implement Server-Side Encryption in Dependent Resources ................................................... 44
Use CloudTrail to Monitor API Calls .................................................................................... 44
Data Transformation ......................................................................................................................... 45
Data Transformation Flow ......................................................................................................... 45
Data Transformation and Status Model ....................................................................................... 45
Lambda Blueprints ................................................................................................................... 45
Data Transformation Failure Handling ......................................................................................... 46
Duration of a Lambda Invocation ............................................................................................... 47
Source Record Backup ............................................................................................................... 47
Record Format Conversion ................................................................................................................. 48
Record Format Conversion Requirements ..................................................................................... 48
Choosing the JSON Deserializer ................................................................................................. 48
Choosing the Serializer ............................................................................................................. 49
Converting Input Record Format (Console) ................................................................................... 49
Converting Input Record Format (API) ......................................................................................... 50
Record Format Conversion Error Handling ................................................................................... 50
Record Format Conversion Example ............................................................................................ 51
Integration with Kinesis Data Analytics ............................................................................................... 52
Create a Kinesis Data Analytics Application That Reads from a Delivery Stream ................................. 52
Write Data from a Kinesis Data Analytics Application to a Delivery Stream ....................................... 52
Data Delivery ................................................................................................................................... 53
Data Delivery Format ................................................................................................................ 53
Data Delivery Frequency ........................................................................................................... 54
Data Delivery Failure Handling ................................................................................................... 54
Amazon S3 Object Name Format ............................................................................................... 56
Index Rotation for the Amazon ES Destination ............................................................................. 56
Monitoring ....................................................................................................................................... 57
Monitoring with CloudWatch Metrics .......................................................................................... 57
Data Delivery CloudWatch Metrics ...................................................................................... 58
Data Ingestion Metrics ...................................................................................................... 62
API-Level CloudWatch Metrics ............................................................................................ 65
Data Transformation CloudWatch Metrics ............................................................................ 67
Format Conversion CloudWatch Metrics .............................................................................. 67
Server-Side Encryption (SSE) CloudWatch Metrics ................................................................. 68
Dimensions for Kinesis Data Firehose .................................................................................. 68
Kinesis Data Firehose Usage Metrics ................................................................................... 68
Accessing CloudWatch Metrics for Kinesis Data Firehose ........................................................ 69
Best Practices with CloudWatch Alarms ............................................................................... 69
Monitoring with CloudWatch Logs ...................................................................................... 70
Monitoring Agent Health ................................................................................................... 75
Logging Kinesis Data Firehose API Calls with AWS CloudTrail ................................................. 76
Custom Amazon S3 Prefixes .............................................................................................................. 81
The timestamp namespace ...................................................................................................... 81
The firehose namespace ........................................................................................................ 81
Semantic rules ......................................................................................................................... 82
Example prefixes ...................................................................................................................... 83
Using Kinesis Data Firehose with AWS PrivateLink ................................................................................ 84
Interface VPC endpoints (AWS PrivateLink) for Kinesis Data Firehose ............................................... 84
Using interface VPC endpoints (AWS PrivateLink) for Kinesis Data Firehose ...................................... 84
Availability ............................................................................................................................... 86
iv
Amazon Kinesis Data Firehose Developer Guide
v
Amazon Kinesis Data Firehose Developer Guide
Key Concepts
For more information about AWS big data solutions, see Big Data on AWS. For more information about
AWS streaming data solutions, see What is Streaming Data?
Key Concepts
As you get started with Kinesis Data Firehose, you can benefit from understanding the following
concepts:
The underlying entity of Kinesis Data Firehose. You use Kinesis Data Firehose by creating a Kinesis
Data Firehose delivery stream and then sending data to it. For more information, see Creating an
Amazon Kinesis Data Firehose Delivery Stream (p. 5) and Sending Data to an Amazon Kinesis
Data Firehose Delivery Stream (p. 15).
record
The data of interest that your data producer sends to a Kinesis Data Firehose delivery stream. A
record can be as large as 1,000 KB.
data producer
Producers send records to Kinesis Data Firehose delivery streams. For example, a web server that
sends log data to a delivery stream is a data producer. You can also configure your Kinesis Data
Firehose delivery stream to automatically read data from an existing Kinesis data stream, and load
it into destinations. For more information, see Sending Data to an Amazon Kinesis Data Firehose
Delivery Stream (p. 15).
buffer size and buffer interval
Kinesis Data Firehose buffers incoming streaming data to a certain size or for a certain period of time
before delivering it to destinations. Buffer Size is in MBs and Buffer Interval is in seconds.
Data Flow
For Amazon S3 destinations, streaming data is delivered to your S3 bucket. If data transformation is
enabled, you can optionally back up source data to another Amazon S3 bucket.
1
Amazon Kinesis Data Firehose Developer Guide
Data Flow
For Amazon Redshift destinations, streaming data is delivered to your S3 bucket first. Kinesis Data
Firehose then issues an Amazon Redshift COPY command to load data from your S3 bucket to your
Amazon Redshift cluster. If data transformation is enabled, you can optionally back up source data to
another Amazon S3 bucket.
For Amazon ES destinations, streaming data is delivered to your Amazon ES cluster, and it can optionally
be backed up to your S3 bucket concurrently.
2
Amazon Kinesis Data Firehose Developer Guide
Data Flow
For Splunk destinations, streaming data is delivered to Splunk, and it can optionally be backed up to your
S3 bucket concurrently.
3
Amazon Kinesis Data Firehose Developer Guide
Sign Up for AWS
Tasks
• Sign Up for AWS (p. 4)
• Optional: Download Libraries and Tools (p. 4)
If you have an AWS account already, skip to the next task. If you don't have an AWS account, use the
following procedure to create one.
1. Open https://portal.aws.amazon.com/billing/signup.
2. Follow the online instructions.
Part of the sign-up procedure involves receiving a phone call and entering a verification code on the
phone keypad.
• The Amazon Kinesis Data Firehose API Reference is the basic set of operations that Kinesis Data
Firehose supports.
• The AWS SDKs for Go, Java, .NET, Node.js, Python, and Ruby include Kinesis Data Firehose support and
samples.
If your version of the AWS SDK for Java does not include samples for Kinesis Data Firehose, you can
also download the latest AWS SDK from GitHub.
• The AWS Command Line Interface supports Kinesis Data Firehose. The AWS CLI enables you to control
multiple AWS services from the command line and automate them through scripts.
4
Amazon Kinesis Data Firehose Developer Guide
Name and source
You can update the configuration of your delivery stream at any time after it’s created, using the Kinesis
Data Firehose console or UpdateDestination. Your Kinesis Data Firehose delivery stream remains in the
ACTIVE state while your configuration is updated, and you can continue to send data. The updated
configuration normally takes effect within a few minutes. The version number of a Kinesis Data Firehose
delivery stream is increased by a value of 1 after you update the configuration. It is reflected in the
delivered Amazon S3 object name. For more information, see Amazon S3 Object Name Format (p. 56).
The following topics describe how to create a Kinesis Data Firehose delivery stream:
Topics
• Name and source (p. 5)
• Process records (p. 6)
• Select destination (p. 6)
• Configure settings (p. 11)
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Firehose in the navigation pane.
3. Choose Create delivery stream.
4. Enter values for the following fields:
5
Amazon Kinesis Data Firehose Developer Guide
Process records
Process records
This topic describes the Process records page of the Create Delivery Stream wizard in Amazon Kinesis
Data Firehose.
Process records
1. In the Transform source records with AWS Lambda section, provide values for the following field:
Record transformation
To create a Kinesis Data Firehose delivery stream that doesn't transform incoming data, choose
Disabled.
To specify a Lambda function for Kinesis Data Firehose to invoke and use to transform incoming
data before delivering it, choose Enabled. You can configure a new Lambda function using one
of the Lambda blueprints or choose an existing Lambda function. Your Lambda function must
contain the status model that is required by Kinesis Data Firehose. For more information, see
Amazon Kinesis Data Firehose Data Transformation (p. 45).
2. In the Convert record format section, provide values for the following field:
To create a Kinesis Data Firehose delivery stream that doesn't convert the format of the
incoming data records, choose Disabled.
To convert the format of the incoming records, choose Enabled, then specify the output format
you want. You need to specify an AWS Glue table that holds the schema that you want Kinesis
Data Firehose to use to convert your record format. For more information, see Record Format
Conversion (p. 48).
For an example of how to set up record format conversion with AWS CloudFormation, see
AWS::KinesisFirehose::DeliveryStream.
3. Choose Next to go to the Select destination page.
Select destination
This topic describes the Select destination page of the Create Delivery Stream wizard in Amazon Kinesis
Data Firehose.
Kinesis Data Firehose can send records to Amazon Simple Storage Service (Amazon S3), Amazon
Redshift, Amazon Elasticsearch Service (Amazon ES), or Splunk.
Topics
• Choose Amazon S3 for Your Destination (p. 6)
• Choose Amazon Redshift for Your Destination (p. 7)
• Choose Amazon ES for Your Destination (p. 8)
• Choose Splunk for Your Destination (p. 9)
6
Amazon Kinesis Data Firehose Developer Guide
Choose Amazon Redshift for Your Destination
• On the Select destination page, enter values for the following fields:
Destination
Choose an S3 bucket that you own where the streaming data should be delivered. You can
create a new S3 bucket or choose an existing one.
Prefix
(Optional) To use the default prefix for Amazon S3 objects, leave this option blank. Kinesis
Data Firehose automatically uses a prefix in the "YYYY/MM/DD/HH" UTC time format for
delivered Amazon S3 objects. You can also override this default by specifying a custom prefix.
For more information, see Amazon S3 Object Name Format (p. 56) and Custom Amazon S3
Prefixes (p. 81)
Error prefix
(Optional) You can specify a prefix for. Kinesis Data Firehose to use when delivering data
to Amazon S3 in error conditions. For more information, see Amazon S3 Object Name
Format (p. 56) and Custom Amazon S3 Prefixes (p. 81)
• On the Select destination page, enter values for the following fields:
Destination
The Amazon Redshift cluster to which S3 bucket data is copied. Configure the Amazon Redshift
cluster to be publicly accessible and unblock Kinesis Data Firehose IP addresses. For more
information, see Grant Kinesis Data Firehose Access to an Amazon Redshift Destination
(p. 32).
User name
An Amazon Redshift user with permissions to access the Amazon Redshift cluster. This user
must have the Amazon Redshift INSERT permission for copying data from the S3 bucket to the
Amazon Redshift cluster.
Password
The password for the user who has permissions to access the cluster.
Database
Columns
(Optional) The specific columns of the table to which the data is copied. Use this option if the
number of columns defined in your Amazon S3 objects is less than the number of columns
within the Amazon Redshift table.
Intermediate S3 bucket
Kinesis Data Firehose delivers your data to your S3 bucket first and then issues an Amazon
Redshift COPY command to load the data into your Amazon Redshift cluster. Specify an S3
bucket that you own where the streaming data should be delivered. Create a new S3 bucket, or
choose an existing bucket that you own.
Kinesis Data Firehose doesn't delete the data from your S3 bucket after loading it to your
Amazon Redshift cluster. You can manage the data in your S3 bucket using a lifecycle
configuration. For more information, see Object Lifecycle Management in the Amazon Simple
Storage Service Developer Guide.
Prefix
(Optional) To use the default prefix for Amazon S3 objects, leave this option blank. Kinesis
Data Firehose automatically uses a prefix in "YYYY/MM/DD/HH" UTC time format for delivered
Amazon S3 objects. You can add to the start of this prefix. For more information, see Amazon S3
Object Name Format (p. 56).
COPY options
Parameters that you can specify in the Amazon Redshift COPY command. These might be
required for your configuration. For example, "GZIP" is required if Amazon S3 data compression
is enabled. "REGION" is required if your S3 bucket isn't in the same AWS Region as your Amazon
Redshift cluster. For more information, see COPY in the Amazon Redshift Database Developer
Guide.
COPY command
The Amazon Redshift COPY command. For more information, see COPY in the Amazon Redshift
Database Developer Guide.
Retry duration
Time duration (0–7200 seconds) for Kinesis Data Firehose to retry if data COPY to your Amazon
Redshift cluster fails. Kinesis Data Firehose retries every 5 minutes until the retry duration ends.
If you set the retry duration to 0 (zero) seconds, Kinesis Data Firehose does not retry upon a
COPY command failure.
1. On the Select destination page, enter values for the following fields:
Destination
8
Amazon Kinesis Data Firehose Developer Guide
Choose Splunk for Your Destination
Index
The Elasticsearch index name to be used when indexing data to your Amazon ES cluster.
Index rotation
Choose whether and how often the Elasticsearch index should be rotated. If index rotation
is enabled, Kinesis Data Firehose appends the corresponding timestamp to the specified
index name and rotates. For more information, see Index Rotation for the Amazon ES
Destination (p. 56).
Type
The Amazon ES type name to be used when indexing data to your Amazon ES cluster. For
Elasticsearch 6.x, there can be only one type per index. If you try to specify a new type for
an existing index that already has another type, Kinesis Data Firehose returns an error during
runtime.
Time duration (0–7200 seconds) for Kinesis Data Firehose to retry if an index request to your
Amazon ES cluster fails. Kinesis Data Firehose retries every 5 minutes until the retry duration
ends. If you set the retry duration to 0 (zero) seconds, Kinesis Data Firehose does not retry upon
an index request failure.
Backup mode
You can choose to either back up failed records only or all records. If you choose failed
records only, any data that Kinesis Data Firehose can't deliver to your Amazon ES cluster or
that your Lambda function can't transform is backed up to the specified S3 bucket. If you
choose all records, Kinesis Data Firehose backs up all incoming source data to your S3 bucket
concurrently with data delivery to Amazon ES. For more information, see Data Delivery Failure
Handling (p. 54) and Data Transformation Failure Handling (p. 46).
Backup S3 bucket
An S3 bucket you own that is the target of the backup data. Create a new S3 bucket, or choose
an existing bucket that you own.
Backup S3 bucket prefix
(Optional) To use the default prefix for Amazon S3 objects, leave this option blank. Kinesis
Data Firehose automatically uses a prefix in "YYYY/MM/DD/HH" UTC time format for delivered
Amazon S3 objects. You can add to the start of this prefix. For more information, see Amazon S3
Object Name Format (p. 56).
2. Choose Next to go to the Configure settings (p. 11) page.
• On the Select destination page, provide values for the following fields:
Destination
Choose Splunk.
9
Amazon Kinesis Data Firehose Developer Guide
Choose Splunk for Your Destination
To determine the endpoint, see Configure Amazon Kinesis Firehose to Send Data to the Splunk
Platform in the Splunk documentation.
Splunk endpoint type
Choose Raw in most cases. Choose Event if you preprocessed your data using AWS Lambda
to send data to different indexes by event type. For information about what endpoint to use,
see Configure Amazon Kinesis Firehose to send data to the Splunk platform in the Splunk
documentation.
Authentication token
To set up a Splunk endpoint that can receive data from Kinesis Data Firehose, see Installation
and configuration overview for the Splunk Add-on for Amazon Kinesis Firehose in the Splunk
documentation. Save the token that you get from Splunk when you set up the endpoint for this
delivery stream, and add it here.
HEC acknowledgement timeout
Specify how long Kinesis Data Firehose waits for the index acknowledgement from Splunk. If
Splunk doesn’t send the acknowledgment before the timeout is reached, Kinesis Data Firehose
considers it a data delivery failure. Kinesis Data Firehose then either retries or backs up the data
to your Amazon S3 bucket, depending on the retry duration value that you set.
Retry duration
Specify how long Kinesis Data Firehose retries sending data to Splunk.
After sending data, Kinesis Data Firehose first waits for an acknowledgment from Splunk. If an
error occurs or the acknowledgment doesn’t arrive within the acknowledgment timeout period,
Kinesis Data Firehose starts the retry duration counter. It keeps retrying until the retry duration
expires. After that, Kinesis Data Firehose considers it a data delivery failure and backs up the
data to your Amazon S3 bucket.
Every time that Kinesis Data Firehose sends data to Splunk (either the initial attempt or a retry),
it restarts the acknowledgement timeout counter and waits for an acknowledgement from
Splunk.
Even if the retry duration expires, Kinesis Data Firehose still waits for the acknowledgment
until it receives it or the acknowledgement timeout period is reached. If the acknowledgment
times out, Kinesis Data Firehose determines whether there's time left in the retry counter. If
there is time left, it retries again and repeats the logic until it receives an acknowledgment or
determines that the retry time has expired.
If you don't want Kinesis Data Firehose to retry sending data, set this value to 0.
S3 backup mode
Choose whether to back up all the events that Kinesis Data Firehose sends to Splunk or only the
ones for which delivery to Splunk fails. If you require high data durability, turn on this backup
mode for all events. Also consider backing up all events initially, until you verify that your data is
getting indexed correctly in Splunk.
S3 backup bucket
(Optional) To use the default prefix for Amazon S3 objects, leave this option blank. Kinesis
Data Firehose automatically uses a prefix in "YYYY/MM/DD/HH" UTC time format for delivered
10
Amazon Kinesis Data Firehose Developer Guide
Configure settings
Amazon S3 objects. You can add to the start of this prefix. For more information, see Amazon S3
Object Name Format (p. 56).
Configure settings
This topic describes the Configure settings page of the Create Delivery Stream wizard.
Configure settings
1. On the Configure settings page, provide values for the following fields:
Kinesis Data Firehose buffers incoming data before delivering it to Amazon S3. You can choose
a buffer size (1–128 MBs) or buffer interval (60–900 seconds). The condition that is satisfied
first triggers data delivery to Amazon S3. If you enable data transformation, the buffer interval
applies from the time transformed data is received by Kinesis Data Firehose to the data delivery
to Amazon S3. If data delivery to the destination falls behind data writing to the delivery
stream, Kinesis Data Firehose raises the buffer size dynamically to catch up. This action helps
ensure that all data is delivered to the destination.
Compression
Choose GZIP, Snappy, or Zip data compression, or no data compression. Snappy or Zip
compression is not available for delivery streams with Amazon Redshift as the destination.
Encryption
Kinesis Data Firehose supports Amazon S3 server-side encryption with AWS Key Management
Service (AWS KMS) for encrypting delivered data in Amazon S3. You can choose to not encrypt
the data or to encrypt with a key from the list of AWS KMS keys that you own. For more
information, see Protecting Data Using Server-Side Encryption with AWS KMS–Managed Keys
(SSE-KMS).
Error logging
If data transformation is enabled, Kinesis Data Firehose can log the Lambda invocation, and
send data delivery errors to CloudWatch Logs. Then you can view the specific error logs if the
Lambda invocation or data delivery fails. For more information, see Monitoring Kinesis Data
Firehose Using CloudWatch Logs (p. 70).
IAM role
You can choose to create a new role where required permissions are assigned automatically,
or choose an existing role created for Kinesis Data Firehose. The role is used to grant Kinesis
Data Firehose access to your S3 bucket, AWS KMS key (if data encryption is enabled), and
Lambda function (if data transformation is enabled). The console might create a role with
placeholders. You can safely ignore or safely delete lines with %FIREHOSE_BUCKET_NAME
%, %FIREHOSE_DEFAULT_FUNCTION%, or %FIREHOSE_DEFAULT_VERSION%. For more
information, see Grant Kinesis Data Firehose Access to an Amazon S3 Destination (p. 31).
2. Review the settings and choose Create Delivery Stream.
The new Kinesis Data Firehose delivery stream takes a few moments in the Creating state before it is
available. After your Kinesis Data Firehose delivery stream is in an Active state, you can start sending
data to it from your producer.
11
Amazon Kinesis Data Firehose Developer Guide
Prerequisites
{"TICKER_SYMBOL":"QXZ","SECTOR":"HEALTHCARE","CHANGE":-0.05,"PRICE":84.51}
Note that standard Amazon Kinesis Data Firehose charges apply when your delivery stream transmits the
data, but there is no charge when the data is generated. To stop incurring these charges, you can stop
the sample stream from the console at any time.
Contents
• Prerequisites (p. 12)
• Test Using Amazon S3 as the Destination (p. 12)
• Test Using Amazon Redshift as the Destination (p. 12)
• Test Using Amazon ES as the Destination (p. 13)
• Test Using Splunk as the Destination (p. 13)
Prerequisites
Before you begin, create a delivery stream. For more information, see Creating an Amazon Kinesis Data
Firehose Delivery Stream (p. 5).
12
Amazon Kinesis Data Firehose Developer Guide
Test Using Amazon ES as the Destination
1. Your delivery stream expects a table to be present in your Amazon Redshift cluster. Connect to
Amazon Redshift through a SQL interface and run the following statement to create a table that
accepts the sample data.
13
Amazon Kinesis Data Firehose Developer Guide
Test Using Splunk as the Destination
4. Check whether the data is being delivered to your Splunk index. Example search terms in Splunk
are sourcetype="aws:firehose:json" and index="name-of-your-splunk-index". For
more information about how to search for events in Splunk, see Search Manual in the Splunk
documentation.
If the test data doesn't appear in your Splunk index, check your Amazon S3 bucket for failed events.
Also see Data Not Delivered to Splunk.
5. When you finish testing, choose Stop sending demo data to stop incurring usage charges.
14
Amazon Kinesis Data Firehose Developer Guide
Writing Using Kinesis Data Streams
Topics
• Writing to Kinesis Data Firehose Using Kinesis Data Streams (p. 15)
• Writing to Kinesis Data Firehose Using Kinesis Agent (p. 16)
• Writing to Kinesis Data Firehose Using the AWS SDK (p. 25)
• Writing to Kinesis Data Firehose Using CloudWatch Logs (p. 26)
• Writing to Kinesis Data Firehose Using CloudWatch Events (p. 26)
• Writing to Kinesis Data Firehose Using AWS IoT (p. 26)
1. Sign in to the AWS Management Console and open the Kinesis Data Firehose console at https://
console.aws.amazon.com/firehose/.
2. Choose Create Delivery Stream. On the Name and source page, provide values for the following
fields:
15
Amazon Kinesis Data Firehose Developer Guide
Writing Using the Agent
Source
Choose Kinesis stream to configure a Kinesis Data Firehose delivery stream that uses a Kinesis
data stream as a data source. You can then use Kinesis Data Firehose to read data easily from an
existing data stream and load it into destinations.
To use a Kinesis data stream as a source, choose an existing stream in the Kinesis stream list, or
choose Create new to create a new Kinesis data stream. After you create a new stream, choose
Refresh to update the Kinesis stream list. If you have a large number of streams, filter the list
using Filter by name.
Note
When you configure a Kinesis data stream as the source of a Kinesis Data Firehose
delivery stream, the Kinesis Data Firehose PutRecord and PutRecordBatch
operations are disabled. To add data to your Kinesis Data Firehose delivery stream in
this case, use the Kinesis Data Streams PutRecord and PutRecords operations.
Kinesis Data Firehose starts reading data from the LATEST position of your Kinesis stream. For
more information about Kinesis Data Streams positions, see GetShardIterator. Kinesis Data
Firehose calls the Kinesis Data Streams GetRecords operation once per second for each shard.
More than one Kinesis Data Firehose delivery stream can read from the same Kinesis stream.
Other Kinesis applications (consumers) can also read from the same stream. Each call from any
Kinesis Data Firehose delivery stream or other consumer application counts against the overall
throttling limit for the shard. To avoid getting throttled, plan your applications carefully. For
more information about Kinesis Data Streams limits, see Amazon Kinesis Streams Limits.
3. Choose Next to advance to the Process records (p. 6) page.
By default, records are parsed from each file based on the newline ('\n') character. However, the agent
can also be configured to parse multi-line records (see Agent Configuration Settings (p. 18)).
You can install the agent on Linux-based server environments such as web servers, log servers, and
database servers. After installing the agent, configure it by specifying the files to monitor and the
delivery stream for the data. After the agent is configured, it durably collects data from the files and
reliably sends it to the delivery stream.
Topics
• Prerequisites (p. 17)
• Download and Install the Agent (p. 17)
• Configure and Start the Agent (p. 18)
• Agent Configuration Settings (p. 18)
• Monitor Multiple File Directories and Write to Multiple Streams (p. 20)
• Use the Agent to Preprocess Data (p. 21)
16
Amazon Kinesis Data Firehose Developer Guide
Prerequisites
Prerequisites
• Your operating system must be either Amazon Linux AMI with version 2015.09 or later, or Red Hat
Enterprise Linux version 7 or later.
• If you are using Amazon EC2 to run your agent, launch your EC2 instance.
• Manage your AWS credentials using one of the following methods:
• Specify an IAM role when you launch your EC2 instance.
• Specify AWS credentials when you configure the agent (see the entries for awsAccessKeyId and
awsSecretAccessKey in the configuration table under the section called “Agent Configuration
Settings” (p. 18)).
• Edit /etc/sysconfig/aws-kinesis-agent to specify your AWS Region and AWS access keys.
• If your EC2 instance is in a different AWS account, create an IAM role to provide access to the Kinesis
Data Firehose service. Specify that role when you configure the agent (see assumeRoleARN (p. )
and assumeRoleExternalId (p. )). Use one of the previous methods to specify the AWS
credentials of a user in the other account who has permission to assume this role.
• The IAM role or AWS credentials that you specify must have permission to perform the Kinesis
Data Firehose PutRecordBatch operation for the agent to send data to your delivery stream. If you
enable CloudWatch monitoring for the agent, permission to perform the CloudWatch PutMetricData
operation is also needed. For more information, see Controlling Access with Amazon Kinesis Data
Firehose (p. 29), Monitoring Kinesis Agent Health (p. 75), and Authentication and Access Control
for Amazon CloudWatch.
17
Amazon Kinesis Data Firehose Developer Guide
Configure and Start the Agent
1. Open and edit the configuration file (as superuser if using default file access permissions): /etc/
aws-kinesis/agent.json
In this configuration file, specify the files ( "filePattern" ) from which the agent collects data,
and the name of the delivery stream ( "deliveryStream" ) to which the agent sends data. The file
name is a pattern, and the agent recognizes file rotations. You can rotate files or create new files no
more than once per second. The agent uses the file creation time stamp to determine which files
to track and tail into your delivery stream. Creating new files or rotating files more frequently than
once per second does not allow the agent to differentiate properly between them.
{
"flows": [
{
"filePattern": "/tmp/app.log*",
"deliveryStream": "yourdeliverystream"
}
]
}
The default AWS Region is us-east-1. If you are using a different Region, add the
firehose.endpoint setting to the configuration file, specifying the endpoint for your Region. For
more information, see Agent Configuration Settings (p. 18).
2. Start the agent manually:
The agent is now running as a system service in the background. It continuously monitors the specified
files and sends data to the specified delivery stream. Agent activity is logged in /var/log/aws-
kinesis-agent/aws-kinesis-agent.log.
Whenever you change the configuration file, you must stop and start the agent, using the following
commands:
18
Amazon Kinesis Data Firehose Developer Guide
Agent Configuration Settings
assumeRoleARN The Amazon Resource Name (ARN) of the role to be assumed by the user.
For more information, see Delegate Access Across AWS Accounts Using IAM
Roles in the IAM User Guide.
assumeRoleExternalIdAn optional identifier that determines who can assume the role. For more
information, see How to Use an External ID in the IAM User Guide.
awsAccessKeyId AWS access key ID that overrides the default credentials. This setting takes
precedence over all other credential providers.
awsSecretAccessKey AWS secret key that overrides the default credentials. This setting takes
precedence over all other credential providers.
Default: true
Default: monitoring.us-east-1.amazonaws.com
Default: firehose.us-east-1.amazonaws.com
To make the agent aggregate records and then put them to the delivery
aggregatedRecordSizeBytes
stream in one operation, specify this setting. Set it to the size that you want
the aggregate record to have before the agent puts it to the delivery stream.
filePattern [Required] A glob for the files that need to be monitored by the agent. Any
file that matches this pattern is picked up by the agent automatically and
monitored. For all files matching this pattern, grant read permission to aws-
kinesis-agent-user. For the directory containing the files, grant read
and execute permissions to aws-kinesis-agent-user.
Important
The agent picks up any file that matches this pattern. To ensure
that the agent doesn't pick up unintended records, choose this
pattern carefully.
initialPosition The initial position from which the file started to be parsed. Valid values are
START_OF_FILE and END_OF_FILE.
19
Amazon Kinesis Data Firehose Developer Guide
Monitor Multiple File Directories
and Write to Multiple Streams
maxBufferAgeMillis The maximum time, in milliseconds, for which the agent buffers data before
sending it to the delivery stream.
maxBufferSizeBytes The maximum size, in bytes, for which the agent buffers data before sending
it to the delivery stream.
maxBufferSizeRecordsThe maximum number of records for which the agent buffers data before
sending it to the delivery stream.
Default: 500
The time interval, in milliseconds, at which the agent polls and parses the
minTimeBetweenFilePollsMillis
monitored files for new data.
Default: 100
The pattern for identifying the start of a record. A record is made of a line
multiLineStartPattern
that matches the pattern and any following lines that don't match the
pattern. The valid values are regular expressions. By default, each new line in
the log files is parsed as one record.
skipHeaderLines The number of lines for the agent to skip parsing at the beginning of
monitored files.
Default: 0 (zero)
The string that the agent uses to truncate a parsed record when the record
truncatedRecordTerminator
size exceeds the Kinesis Data Firehose record size limit. (1,000 KB)
20
Amazon Kinesis Data Firehose Developer Guide
Use the Agent to Preprocess Data
{
"cloudwatch.emitMetrics": true,
"kinesis.endpoint": "https://your/kinesis/endpoint",
"firehose.endpoint": "https://your/firehose/endpoint",
"flows": [
{
"filePattern": "/tmp/app1.log*",
"kinesisStream": "yourkinesisstream"
},
{
"filePattern": "/tmp/app2.log*",
"deliveryStream": "yourfirehosedeliverystream"
}
]
}
For more detailed information about using the agent with Amazon Kinesis Data Streams, see Writing to
Amazon Kinesis Data Streams with Kinesis Agent.
The agent supports the following processing options. Because the agent is open source, you can further
develop and extend its processing options. You can download the agent from Kinesis Agent.
Processing Options
SINGLELINE
Converts a multi-line record to a single-line record by removing newline characters, leading spaces,
and trailing spaces.
{
"optionName": "SINGLELINE"
}
CSVTOJSON
{
"optionName": "CSVTOJSON",
"customFieldNames": [ "field1", "field2", ... ],
"delimiter": "yourdelimiter"
}
customFieldNames
[Required] The field names used as keys in each JSON key value pair. For example, if you specify
["f1", "f2"], the record "v1, v2" is converted to {"f1":"v1","f2":"v2"}.
delimiter
The string used as the delimiter in the record. The default is a comma (,).
21
Amazon Kinesis Data Firehose Developer Guide
Use the Agent to Preprocess Data
LOGTOJSON
Converts a record from a log format to JSON format. The supported log formats are Apache
Common Log, Apache Combined Log, Apache Error Log, and RFC3164 Syslog.
{
"optionName": "LOGTOJSON",
"logFormat": "logformat",
"matchPattern": "yourregexpattern",
"customFieldNames": [ "field1", "field2", … ]
}
logFormat
[Required] The log entry format. The following are possible values:
• COMMONAPACHELOG — The Apache Common Log format. Each log entry has the
following pattern by default: "%{host} %{ident} %{authuser} [%{datetime}]
\"%{request}\" %{response} %{bytes}".
• COMBINEDAPACHELOG — The Apache Combined Log format. Each log entry has the
following pattern by default: "%{host} %{ident} %{authuser} [%{datetime}]
\"%{request}\" %{response} %{bytes} %{referrer} %{agent}".
• APACHEERRORLOG — The Apache Error Log format. Each log entry has the following pattern
by default: "[%{timestamp}] [%{module}:%{severity}] [pid %{processid}:tid
%{threadid}] [client: %{client}] %{message}".
• SYSLOG — The RFC3164 Syslog format. Each log entry has the following pattern by default:
"%{timestamp} %{hostname} %{program}[%{processid}]: %{message}".
matchPattern
Overrides the default pattern for the specified log format. Use this setting to extract values
from log entries if they use a custom format. If you specify matchPattern, you must also
specify customFieldNames.
customFieldNames
The custom field names used as keys in each JSON key value pair. You can use this setting to
define field names for values extracted from matchPattern, or override the default field
names of predefined log formats.
Here is one example of a LOGTOJSON configuration for an Apache Common Log entry converted to JSON
format:
{
"optionName": "LOGTOJSON",
"logFormat": "COMMONAPACHELOG"
}
Before conversion:
After conversion:
22
Amazon Kinesis Data Firehose Developer Guide
Use the Agent to Preprocess Data
{"host":"64.242.88.10","ident":null,"authuser":null,"datetime":"07/
Mar/2004:16:10:02 -0800","request":"GET /mailman/listinfo/hsdivision
HTTP/1.1","response":"200","bytes":"6291"}
{
"optionName": "LOGTOJSON",
"logFormat": "COMMONAPACHELOG",
"customFieldNames": ["f1", "f2", "f3", "f4", "f5", "f6", "f7"]
}
With this configuration setting, the same Apache Common Log entry from the previous example is
converted to JSON format as follows:
{"f1":"64.242.88.10","f2":null,"f3":null,"f4":"07/Mar/2004:16:10:02 -0800","f5":"GET /
mailman/listinfo/hsdivision HTTP/1.1","f6":"200","f7":"6291"}
The following flow configuration converts an Apache Common Log entry to a single-line record in JSON
format:
{
"flows": [
{
"filePattern": "/tmp/app.log*",
"deliveryStream": "my-delivery-stream",
"dataProcessingOptions": [
{
"optionName": "LOGTOJSON",
"logFormat": "COMMONAPACHELOG"
}
]
}
]
}
The following flow configuration parses multi-line records whose first line starts with "[SEQUENCE=".
Each record is first converted to a single-line record. Then, values are extracted from the record based on
a tab delimiter. Extracted values are mapped to specified customFieldNames values to form a single-
line record in JSON format.
{
"flows": [
{
"filePattern": "/tmp/app.log*",
"deliveryStream": "my-delivery-stream",
"multiLineStartPattern": "\\[SEQUENCE=",
"dataProcessingOptions": [
{
"optionName": "SINGLELINE"
},
{
23
Amazon Kinesis Data Firehose Developer Guide
Agent CLI Commands
"optionName": "CSVTOJSON",
"customFieldNames": [ "field1", "field2", "field3" ],
"delimiter": "\\t"
}
]
}
]
}
Here is one example of a LOGTOJSON configuration for an Apache Common Log entry converted to JSON
format, with the last field (bytes) omitted:
{
"optionName": "LOGTOJSON",
"logFormat": "COMMONAPACHELOG",
"matchPattern": "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\
\d{3})",
"customFieldNames": ["host", "ident", "authuser", "datetime", "request", "response"]
}
Before conversion:
After conversion:
{"host":"123.45.67.89","ident":null,"authuser":null,"datetime":"27/Oct/2000:09:27:09
-0400","request":"GET /java/javaResources.html HTTP/1.0","response":"200"}
/var/log/aws-kinesis-agent/aws-kinesis-agent.log
24
Amazon Kinesis Data Firehose Developer Guide
Writing Using the AWS SDK
These examples do not represent production-ready code, in that they do not check for all possible
exceptions, or account for all possible security or performance considerations.
The Kinesis Data Firehose API offers two operations for sending data to your delivery stream: PutRecord
and PutRecordBatch. PutRecord() sends one data record within one call and PutRecordBatch() can
send multiple data records within one call.
Topics
• Single Write Operations Using PutRecord (p. 25)
• Batch Write Operations Using PutRecordBatch (p. 25)
For more code context, see the sample code included in the AWS SDK. For information about request and
response syntax, see the relevant topic in Amazon Kinesis Data Firehose API Operations.
25
Amazon Kinesis Data Firehose Developer Guide
Writing Using CloudWatch Logs
recordList.clear();
For more code context, see the sample code included in the AWS SDK. For information about request and
response syntax, see the relevant topic in Amazon Kinesis Data Firehose API Operations.
To create a target for a CloudWatch Events rule that sends events to an existing delivery
stream
1. Sign in to the AWS Management Console and open the CloudWatch console at https://
console.aws.amazon.com/cloudwatch/.
2. Choose Create rule.
3. On the Step 1: Create rule page, for Targets, choose Add target, and then choose Firehose delivery
stream.
4. For Delivery stream, choose an existing Kinesis Data Firehose delivery stream.
For more information about creating CloudWatch Events rules, see Getting Started with Amazon
CloudWatch Events.
To create an action that sends events to an existing Kinesis Data Firehose delivery stream
1. When creating a rule in the AWS IoT console, on the Create a rule page, under Set one or more
actions, choose Add action.
2. Choose Send messages to an Amazon Kinesis Firehose stream.
3. Choose Configure action.
4. For Stream name, choose an existing Kinesis Data Firehose delivery stream.
5. For Separator, choose a separator character to be inserted between records.
6. For IAM role name, choose an existing IAM role or choose Create a new role.
7. Choose Add action.
26
Amazon Kinesis Data Firehose Developer Guide
Writing Using AWS IoT
For more information about creating AWS IoT rules, see AWS IoT Rule Tutorials.
27
Amazon Kinesis Data Firehose Developer Guide
Data Protection
Security is a shared responsibility between AWS and you. The shared responsibility model describes this
as security of the cloud and security in the cloud:
• Security of the cloud – AWS is responsible for protecting the infrastructure that runs AWS services
in the AWS Cloud. AWS also provides you with services that you can use securely. The effectiveness
of our security is regularly tested and verified by third-party auditors as part of the AWS compliance
programs. To learn about the compliance programs that apply to Kinesis Data Firehose, see AWS
Services in Scope by Compliance Program.
• Security in the cloud – Your responsibility is determined by the AWS service that you use. You are also
responsible for other factors including the sensitivity of your data, your organization’s requirements,
and applicable laws and regulations.
This documentation helps you understand how to apply the shared responsibility model when using
Kinesis Data Firehose. The following topics show you how to configure Kinesis Data Firehose to meet
your security and compliance objectives. You'll also learn how to use other AWS services that can help
you to monitor and secure your Kinesis Data Firehose resources.
Topics
• Data Protection in Amazon Kinesis Data Firehose (p. 28)
• Controlling Access with Amazon Kinesis Data Firehose (p. 29)
• Monitoring Amazon Kinesis Data Firehose (p. 41)
• Compliance Validation for Amazon Kinesis Data Firehose (p. 42)
• Resilience in Amazon Kinesis Data Firehose (p. 42)
• Infrastructure Security in Kinesis Data Firehose (p. 43)
• Security Best Practices for Kinesis Data Firehose (p. 43)
When you send data from your data producers to your data stream, Kinesis Data Streams encrypts your
data using an AWS Key Management Service (AWS KMS) key before storing the data at rest. When your
Kinesis Data Firehose delivery stream reads the data from your data stream, Kinesis Data Streams first
decrypts the data and then sends it to Kinesis Data Firehose. Kinesis Data Firehose buffers the data in
28
Amazon Kinesis Data Firehose Developer Guide
Server-Side Encryption with
Direct PUT or Other Data Sources
memory based on the buffering hints that you specify. It then delivers it to your destinations without
storing the unencrypted data at rest.
For information about how to enable server-side encryption for Kinesis Data Streams, see Using Server-
Side Encryption in the Amazon Kinesis Data Streams Developer Guide.
You can also enable SSE when you create the delivery stream. To do that, specify
DeliveryStreamEncryptionConfigurationInput when you invoke CreateDeliveryStream.
When the CMK is of type CUSTOMER_MANAGED_CMK, if the Amazon Kinesis Data Firehose service is
unable to decrypt records because of a KMSNotFoundException, a KMSInvalidStateException,
a KMSDisabledExcetion, or a KMSAccessDeniedException, the service waits up to 24 hours (the
retention period) for you to resolve the problem. If the problem persists beyond the retention period, the
service skips those records that have passed the retention period and couldn't be decrypted, and then
discards the data. Amazon Kinesis Data Firehose provides the following four CloudWatch metrics that
you can use to track the four AWS KMS exceptions:
• KMSKeyAccessDenied
• KMSKeyDisabled
• KMSKeyInvalidState
• KMSKeyNotFound
For more information about these four metrics, see the section called “Monitoring with CloudWatch
Metrics” (p. 57).
Important
To encrypt your delivery stream, use symmetric CMKs. Kinesis Data Firehose doesn't support
asymmetric CMKs. For information about symmetric and asymmetric CMKs, see About
Symmetric and Asymmetric CMKs in the AWS Key Management Service developer guide.
Contents
• Grant Your Application Access to Your Kinesis Data Firehose Resources (p. 30)
29
Amazon Kinesis Data Firehose Developer Guide
Grant Your Application Access to
Your Kinesis Data Firehose Resources
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"firehose:DeleteDeliveryStream",
"firehose:PutRecord",
"firehose:PutRecordBatch",
"firehose:UpdateDestination"
],
"Resource": [
"arn:aws:firehose:region:account-id:deliverystream/delivery-stream-name"
]
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "firehose.amazonaws.com"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
30
Amazon Kinesis Data Firehose Developer Guide
Grant Kinesis Data Firehose Access
to an Amazon S3 Destination
"sts:ExternalId":"account-id"
}
}
}
]
}
For information about how to modify the trust relationship of a role, see Modifying a Role.
Use the following access policy to enable Kinesis Data Firehose to access your S3 bucket and AWS KMS
key. If you don't own the S3 bucket, add s3:PutObjectAcl to the list of Amazon S3 actions. This grants
the bucket owner full access to the objects delivered by Kinesis Data Firehose. This policy also has a
statement that allows access to Amazon Kinesis Data Streams. If you don't use Kinesis Data Streams as
your data source, you can remove that statement.
{
"Version": "2012-10-17",
"Statement":
[
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
]
},
{
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:GetShardIterator",
"kinesis:GetRecords",
"kinesis:ListShards"
],
"Resource": "arn:aws:kinesis:region:account-id:stream/stream-name"
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:GenerateDataKey"
],
"Resource": [
"arn:aws:kms:region:account-id:key/key-id"
31
Amazon Kinesis Data Firehose Developer Guide
Grant Kinesis Data Firehose Access
to an Amazon Redshift Destination
],
"Condition": {
"StringEquals": {
"kms:ViaService": "s3.region.amazonaws.com"
},
"StringLike": {
"kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::bucket-name/prefix*"
}
}
},
{
"Effect": "Allow",
"Action": [
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:region:account-id:log-group:log-group-name:log-stream:log-
stream-name"
]
},
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction",
"lambda:GetFunctionConfiguration"
],
"Resource": [
"arn:aws:lambda:region:account-id:function:function-name:function-version"
]
}
]
}
For more information about allowing other AWS services to access your AWS resources, see Creating a
Role to Delegate Permissions to an AWS Service in the IAM User Guide.
Topics
• IAM Role and Access Policy (p. 32)
• VPC Access to an Amazon Redshift Cluster (p. 34)
Use the following access policy to enable Kinesis Data Firehose to access your S3 bucket and AWS KMS
key. If you don't own the S3 bucket, add s3:PutObjectAcl to the list of Amazon S3 actions, which
32
Amazon Kinesis Data Firehose Developer Guide
Grant Kinesis Data Firehose Access
to an Amazon Redshift Destination
grants the bucket owner full access to the objects delivered by Kinesis Data Firehose. This policy also has
a statement that allows access to Amazon Kinesis Data Streams. If you don't use Kinesis Data Streams as
your data source, you can remove that statement.
{
"Version": "2012-10-17",
"Statement":
[
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
]
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:GenerateDataKey"
],
"Resource": [
"arn:aws:kms:region:account-id:key/key-id"
],
"Condition": {
"StringEquals": {
"kms:ViaService": "s3.region.amazonaws.com"
},
"StringLike": {
"kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::bucket-name/prefix*"
}
}
},
{
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:GetShardIterator",
"kinesis:GetRecords",
"kinesis:ListShards"
],
"Resource": "arn:aws:kinesis:region:account-id:stream/stream-name"
},
{
"Effect": "Allow",
"Action": [
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:region:account-id:log-group:log-group-name:log-stream:log-
stream-name"
]
},
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction",
33
Amazon Kinesis Data Firehose Developer Guide
Grant Kinesis Data Firehose Access
to an Amazon ES Destination
"lambda:GetFunctionConfiguration"
],
"Resource": [
"arn:aws:lambda:region:account-id:function:function-name:function-version"
]
}
]
}
For more information about allowing other AWS services to access your AWS resources, see Creating a
Role to Delegate Permissions to an AWS Service in the IAM User Guide.
For more information about how to unblock IP addresses, see the step Authorize Access to the Cluster in
the Amazon Redshift Getting Started guide.
34
Amazon Kinesis Data Firehose Developer Guide
Grant Kinesis Data Firehose Access
to an Amazon ES Destination
Kinesis Data Firehose also sends data delivery errors to your CloudWatch log group and streams. Kinesis
Data Firehose uses an IAM role to access the specified Elasticsearch domain, S3 bucket, AWS KMS key,
and CloudWatch log group and streams. You are required to have an IAM role when creating a delivery
stream.
Use the following access policy to enable Kinesis Data Firehose to access your S3 bucket, Amazon ES
domain, and AWS KMS key. If you do not own the S3 bucket, add s3:PutObjectAcl to the list of
Amazon S3 actions, which grants the bucket owner full access to the objects delivered by Kinesis Data
Firehose. This policy also has a statement that allows access to Amazon Kinesis Data Streams. If you
don't use Kinesis Data Streams as your data source, you can remove that statement.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
]
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:GenerateDataKey"
],
"Resource": [
"arn:aws:kms:region:account-id:key/key-id"
],
"Condition": {
"StringEquals": {
"kms:ViaService": "s3.region.amazonaws.com"
},
"StringLike": {
"kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::bucket-name/prefix*"
}
}
},
{
"Effect": "Allow",
"Action": [
"es:DescribeElasticsearchDomain",
"es:DescribeElasticsearchDomains",
"es:DescribeElasticsearchDomainConfig",
"es:ESHttpPost",
"es:ESHttpPut"
],
"Resource": [
"arn:aws:es:region:account-id:domain/domain-name",
"arn:aws:es:region:account-id:domain/domain-name/*"
]
},
{
"Effect": "Allow",
"Action": [
35
Amazon Kinesis Data Firehose Developer Guide
Grant Kinesis Data Firehose Access to a Splunk Destination
"es:ESHttpGet"
],
"Resource": [
"arn:aws:es:region:account-id:domain/domain-name/_all/_settings",
"arn:aws:es:region:account-id:domain/domain-name/_cluster/stats",
"arn:aws:es:region:account-id:domain/domain-name/index-name*/_mapping/type-
name",
"arn:aws:es:region:account-id:domain/domain-name/_nodes",
"arn:aws:es:region:account-id:domain/domain-name/_nodes/stats",
"arn:aws:es:region:account-id:domain/domain-name/_nodes/*/stats",
"arn:aws:es:region:account-id:domain/domain-name/_stats",
"arn:aws:es:region:account-id:domain/domain-name/index-name*/_stats"
]
},
{
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:GetShardIterator",
"kinesis:GetRecords",
"kinesis:ListShards"
],
"Resource": "arn:aws:kinesis:region:account-id:stream/stream-name"
},
{
"Effect": "Allow",
"Action": [
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:region:account-id:log-group:log-group-name:log-stream:log-
stream-name"
]
},
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction",
"lambda:GetFunctionConfiguration"
],
"Resource": [
"arn:aws:lambda:region:account-id:function:function-name:function-version"
]
}
]
}
For more information about allowing other AWS services to access your AWS resources, see Creating a
Role to Delegate Permissions to an AWS Service in the IAM User Guide.
36
Amazon Kinesis Data Firehose Developer Guide
Grant Kinesis Data Firehose Access to a Splunk Destination
You are required to have an IAM role when creating a delivery stream. Kinesis Data Firehose assumes that
IAM role and gains access to the specified bucket, key, and CloudWatch log group and streams.
Use the following access policy to enable Kinesis Data Firehose to access your S3 bucket. If you don't own
the S3 bucket, add s3:PutObjectAcl to the list of Amazon S3 actions, which grants the bucket owner
full access to the objects delivered by Kinesis Data Firehose. This policy also grants Kinesis Data Firehose
access to CloudWatch for error logging and to AWS Lambda for data transformation. The policy also has
a statement that allows access to Amazon Kinesis Data Streams. If you don't use Kinesis Data Streams as
your data source, you can remove that statement. Kinesis Data Firehose doesn't use IAM to access Splunk.
For accessing Splunk, it uses your HEC token.
{
"Version": "2012-10-17",
"Statement":
[
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
]
},
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:GenerateDataKey"
],
"Resource": [
"arn:aws:kms:region:account-id:key/key-id"
],
"Condition": {
"StringEquals": {
"kms:ViaService": "s3.region.amazonaws.com"
},
"StringLike": {
"kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::bucket-name/prefix*"
}
}
},
{
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:GetShardIterator",
"kinesis:GetRecords",
"kinesis:ListShards"
],
"Resource": "arn:aws:kinesis:region:account-id:stream/stream-name"
},
{
"Effect": "Allow",
"Action": [
"logs:PutLogEvents"
],
"Resource": [
37
Amazon Kinesis Data Firehose Developer Guide
Access to Splunk in VPC
"arn:aws:logs:region:account-id:log-group:log-group-name:log-stream:*"
]
},
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction",
"lambda:GetFunctionConfiguration"
],
"Resource": [
"arn:aws:lambda:region:account-id:function:function-name:function-version"
]
}
]
}
For more information about allowing other AWS services to access your AWS resources, see Creating a
Role to Delegate Permissions to an AWS Service in the IAM User Guide.
38
Amazon Kinesis Data Firehose Developer Guide
Cross-Account Delivery to an Amazon ES Destination
configuring a Kinesis Data Firehose delivery stream owned by account A to deliver data to an Amazon S3
bucket owned by account B.
1. Create an IAM role under account A using steps described in Grant Kinesis Firehose Access to an
Amazon S3 Destination.
Note
The Amazon S3 bucket specified in the access policy is owned by account B in this case.
Make sure you add s3:PutObjectAcl to the list of Amazon S3 actions in the access policy,
which grants account B full access to the objects delivered by Amazon Kinesis Data Firehose.
2. To allow access from the IAM role previously created, create an S3 bucket policy under account
B. The following code is an example of the bucket policy. For more information, see Using Bucket
Policies and User Policies.
"Version": "2012-10-17",
"Id": "PolicyID",
"Statement": [
{
"Sid": "StmtID",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::accountA-id:role/iam-role-name"
},
"Action": [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
]
}
]
}
3. Create a Kinesis Data Firehose delivery stream under account A using the IAM role that you created
in step 1.
1. Create an IAM role under account A using the steps described in the section called “Grant Kinesis
Data Firehose Access to an Amazon ES Destination” (p. 34).
2. To allow access from the IAM role that you created in the previous setp, create an Amazon ES policy
under account B. The following JSON is an example.
39
Amazon Kinesis Data Firehose Developer Guide
Using Tags to Control Access
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::Account-A-ID:role/firehose_delivery_role "
},
"Action": "es:ESHttpGet",
"Resource": [
"arn:aws:es:us-east-1:Account-B-ID:domain/cross-account-cluster/_all/
_settings",
"arn:aws:es:us-east-1:Account-B-ID:domain/cross-account-cluster/_cluster/
stats",
"arn:aws:es:us-east-1:Account-B-ID:domain/cross-account-cluster/roletest*/
_mapping/roletest",
"arn:aws:es:us-east-1:Account-B-ID:domain/cross-account-cluster/_nodes",
"arn:aws:es:us-east-1:Account-B-ID:domain/cross-account-cluster/_nodes/stats",
"arn:aws:es:us-east-1:Account-B-ID:domain/cross-account-cluster/_nodes/*/
stats",
"arn:aws:es:us-east-1:Account-B-ID:domain/cross-account-cluster/_stats",
"arn:aws:es:us-east-1:Account-B-ID:domain/cross-account-cluster/roletest*/
_stats"
]
}
]
}
3. Create a Kinesis Data Firehose delivery stream under account A using the IAM role that you created
in step 1. When you create the delivery stream, use the AWS CLI or the Kinesis Data Firehose APIs
and specify the ClusterEndpoint field instead of DomainARN for Amazon ES.
Note
To create a delivery stream in one AWS account with an Amazon ES destination in a different
account, you must use the AWS CLI or the Kinesis Data Firehose APIs. You can't use the AWS
Management Console to create this kind of cross-account configuration.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "firehose:CreateDeliveryStream",
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:RequestTag/MyKey": "MyValue"
}
40
Amazon Kinesis Data Firehose Developer Guide
Monitoring
}
}
]
}
UntagDeliveryStream
For the UntagDeliveryStream operation, use the aws:TagKeys condition key. In the following
example, MyKey is an example tag key.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "firehose:UntagDeliveryStream",
"Resource": "*",
"Condition": {
"ForAnyValue:StringEquals": {
"aws:TagKeys": "MyKey"
}
}
}
]
}
ListDeliveryStreams
You can't use tag-based access control with ListDeliveryStreams.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": "firehose:DescribeDeliveryStream",
"Resource": "*",
"Condition": {
"Null": {
"firehose:ResourceTag/MyKey": "MyValue"
}
}
]
}
41
Amazon Kinesis Data Firehose Developer Guide
Compliance Validation
For a list of AWS services in scope of specific compliance programs, see AWS Services in Scope by
Compliance Program. For general information, see AWS Compliance Programs.
You can download third-party audit reports using AWS Artifact. For more information, see Downloading
Reports in AWS Artifact.
Your compliance responsibility when using Kinesis Data Firehose is determined by the sensitivity of your
data, your company's compliance objectives, and applicable laws and regulations. If your use of Kinesis
Data Firehose is subject to compliance with standards such as HIPAA, PCI, or FedRAMP, AWS provides
resources to help:
• Security and Compliance Quick Start Guides – These deployment guides discuss architectural
considerations and provide steps for deploying security- and compliance-focused baseline
environments on AWS.
• Architecting for HIPAA Security and Compliance Whitepaper – This whitepaper describes how
companies can use AWS to create HIPAA-compliant applications.
• AWS Compliance Resources – This collection of workbooks and guides might apply to your industry
and location.
• AWS Config – This AWS service assesses how well your resource configurations comply with internal
practices, industry guidelines, and regulations.
• AWS Security Hub – This AWS service provides a comprehensive view of your security state within AWS
that helps you check your compliance with security industry standards and best practices.
For more information about AWS Regions and Availability Zones, see AWS Global Infrastructure.
In addition to the AWS global infrastructure, Kinesis Data Firehose offers several features to help support
your data resiliency and backup needs.
Disaster Recovery
Kinesis Data Firehose runs in a serverless mode, and takes care of host degradations, Availability Zone
availability, and other infrastructure related issues by performing automatic migration. When this
happens, Kinesis Data Firehose ensures that the delivery stream is migrated without any loss of data.
42
Amazon Kinesis Data Firehose Developer Guide
Infrastructure Security
You use AWS published API calls to access Kinesis Data Firehose through the network. Clients must
support Transport Layer Security (TLS) 1.0 or later. We recommend TLS 1.2 or later. Clients must also
support cipher suites with perfect forward secrecy (PFS) such as Ephemeral Diffie-Hellman (DHE) or
Elliptic Curve Ephemeral Diffie-Hellman (ECDHE). Most modern systems such as Java 7 and later support
these modes.
Additionally, requests must be signed by using an access key ID and a secret access key that is associated
with an IAM principal. Or you can use the AWS Security Token Service (AWS STS) to generate temporary
security credentials to sign requests.
Instead, you should use an IAM role to manage temporary credentials for your producer and client
applications to access Kinesis Data Firehose delivery streams. When you use a role, you don't have to use
long-term credentials (such as a user name and password or access keys) to access other resources.
For more information, see the following topics in the IAM User Guide:
• IAM Roles
• Common Scenarios for Roles: Users, Applications, and Services
43
Amazon Kinesis Data Firehose Developer Guide
Implement Server-Side Encryption in Dependent Resources
Using the information collected by CloudTrail, you can determine the request that was made to Kinesis
Data Firehose, the IP address from which the request was made, who made the request, when it was
made, and additional details.
For more information, see the section called “Logging Kinesis Data Firehose API Calls with AWS
CloudTrail” (p. 76).
44
Amazon Kinesis Data Firehose Developer Guide
Data Transformation Flow
recordId
The record ID is passed from Kinesis Data Firehose to Lambda during the invocation. The
transformed record must contain the same record ID. Any mismatch between the ID of the original
record and the ID of the transformed record is treated as a data transformation failure.
result
The status of the data transformation of the record. The possible values are: Ok (the record was
transformed successfully), Dropped (the record was dropped intentionally by your processing logic),
and ProcessingFailed (the record could not be transformed). If a record has a status of Ok or
Dropped, Kinesis Data Firehose considers it successfully processed. Otherwise, Kinesis Data Firehose
considers it unsuccessfully processed.
data
Lambda Blueprints
There are blueprints that you can use to create a Lambda function for data transformation. Some
of these blueprints are in the AWS Lambda console and some are in the AWS Serverless Application
Repository.
45
Amazon Kinesis Data Firehose Developer Guide
Data Transformation Failure Handling
To see the blueprints that are available in the AWS Lambda console
1. Sign in to the AWS Management Console and open the AWS Lambda console at https://
console.aws.amazon.com/lambda/.
2. Choose Create function, and then choose Use a blueprint.
3. In the Blueprints field, search for the keyword firehose to find the Kinesis Data Firehose Lambda
blueprints.
To see the blueprints that are available in the AWS Serverless Application Repository
You can also create a Lambda function without using a blueprint. See Getting Started with AWS Lambda.
If the status of the data transformation of a record is ProcessingFailed, Kinesis Data Firehose treats
the record as unsuccessfully processed. For this type of failure, you can emit error logs to Amazon
CloudWatch Logs from your Lambda function. For more information, see Accessing Amazon CloudWatch
Logs for AWS Lambda in the AWS Lambda Developer Guide.
If data transformation fails, the unsuccessfully processed records are delivered to your S3 bucket in the
processing-failed folder. The records have the following format:
{
"attemptsMade": "count",
"arrivalTimestamp": "timestamp",
"errorCode": "code",
"errorMessage": "message",
"attemptEndingTimestamp": "timestamp",
"rawData": "data",
"lambdaArn": "arn"
}
attemptsMade
The time that the record was received by Kinesis Data Firehose.
errorCode
46
Amazon Kinesis Data Firehose Developer Guide
Duration of a Lambda Invocation
errorMessage
The time that Kinesis Data Firehose stopped attempting Lambda invocations.
rawData
For information about what Kinesis Data Firehose does if such an error occurs, see the section called
“Data Transformation Failure Handling” (p. 46).
47
Amazon Kinesis Data Firehose Developer Guide
Record Format Conversion Requirements
Topics
• Record Format Conversion Requirements (p. 48)
• Choosing the JSON Deserializer (p. 48)
• Choosing the Serializer (p. 49)
• Converting Input Record Format (Console) (p. 49)
• Converting Input Record Format (API) (p. 50)
• Record Format Conversion Error Handling (p. 50)
• Record Format Conversion Example (p. 51)
• A deserializer to read the JSON of your input data – You can choose one of two types of deserializers:
Apache Hive JSON SerDe or OpenX JSON SerDe.
• A schema to determine how to interpret that data – Use AWS Glue to create a schema in the AWS
Glue Data Catalog. Kinesis Data Firehose then references that schema and uses it to interpret your
input data. You can use the same schema to configure both Kinesis Data Firehose and your analytics
software. For more information, see Populating the AWS Glue Data Catalog in the AWS Glue Developer
Guide.
• A serializer to convert the data to the target columnar storage format (Parquet or ORC) – You can
choose one of two types of serializers: ORC SerDe or Parquet SerDe.
Important
If you enable record format conversion, you can't set your Kinesis Data Firehose destination
to be Amazon Elasticsearch Service (Amazon ES), Amazon Redshift, or Splunk. With format
conversion enabled, Amazon S3 is the only destination that you can use for your Kinesis Data
Firehose delivery stream.
You can convert the format of your data even if you aggregate your records before sending them to
Kinesis Data Firehose.
48
Amazon Kinesis Data Firehose Developer Guide
Choosing the Serializer
The OpenX JSON SerDe can convert periods (.) to underscores (_). It can also convert JSON keys to
lowercase before deserializing them. For more information about the options that are available with this
deserializer through Kinesis Data Firehose, see OpenXJsonSerDe.
If you're not sure which deserializer to choose, use the OpenX JSON SerDe, unless you have time stamps
that it doesn't support.
If you have time stamps in formats other than those listed previously, use the Apache Hive JSON SerDe.
When you choose this deserializer, you can specify the time stamp formats to use. To do this, follow
the pattern syntax of the Joda-Time DateTimeFormat format strings. For more information, see Class
DateTimeFormat.
You can also use the special value millis to parse time stamps in epoch milliseconds. If you don't
specify a format, Kinesis Data Firehose uses java.sql.Timestamp::valueOf by default.
The Hive SerDe doesn't convert nested JSON into strings. For example, if you have {"a":
{"inner":1}}, it doesn't treat {"inner":1} as a string.
1. Sign in to the AWS Management Console, and open the Kinesis Data Firehose console at https://
console.aws.amazon.com/firehose/.
49
Amazon Kinesis Data Firehose Developer Guide
Converting Input Record Format (API)
2. Choose a Kinesis Data Firehose delivery stream to update, or create a new delivery stream by
following the steps in Creating an Amazon Kinesis Data Firehose Delivery Stream (p. 5).
3. Under Convert record format, set Record format conversion to Enabled.
4. Choose the output format that you want. For more information about the two options, see Apache
Parquet and Apache ORC.
5. Choose an AWS Glue table to specify a schema for your source records. Set the Region, database,
table, and table version.
• In BufferingHints, you can't set SizeInMBs to a value less than 64 if you enable record format
conversion. Also, when format conversion isn't enabled, the default value is 5. The value becomes 128
when you enable it.
• You must set CompressionFormat in ExtendedS3DestinationConfiguration or in
ExtendedS3DestinationUpdate to UNCOMPRESSED. The default value for CompressionFormat is
UNCOMPRESSED. Therefore, you can also leave it unspecified in ExtendedS3DestinationConfiguration.
The data still gets compressed as part of the serialization process, using Snappy compression by
default. The framing format for Snappy that Kinesis Data Firehose uses in this case is compatible
with Hadoop. This means that you can use the results of the Snappy compression and run
queries on this data in Athena. For the Snappy framing format that Hadoop relies on, see
BlockCompressorStream.java. When you configure the serializer, you can choose other types of
compression.
{
"attemptsMade": long,
"arrivalTimestamp": long,
"lastErrorCode": string,
"lastErrorMessage": string,
"attemptEndingTimestamp": long,
"rawData": string,
"sequenceNumber": string,
"subSequenceNumber": long,
"dataCatalogTable": {
"catalogId": string,
"databaseName": string,
"tableName": string,
"region": string,
"versionId": string,
"catalogArn": string
}
}
50
Amazon Kinesis Data Firehose Developer Guide
Record Format Conversion Example
51
Amazon Kinesis Data Firehose Developer Guide
Create a Kinesis Data Analytics Application
That Reads from a Delivery Stream
52
Amazon Kinesis Data Firehose Developer Guide
Data Delivery Format
Topics
• Data Delivery Format (p. 53)
• Data Delivery Frequency (p. 54)
• Data Delivery Failure Handling (p. 54)
• Amazon S3 Object Name Format (p. 56)
• Index Rotation for the Amazon ES Destination (p. 56)
For data delivery to Amazon Redshift, Kinesis Data Firehose first delivers incoming data to your S3
bucket in the format described earlier. Kinesis Data Firehose then issues an Amazon Redshift COPY
command to load the data from your S3 bucket to your Amazon Redshift cluster. Ensure that after
Kinesis Data Firehose concatenates multiple incoming records to an Amazon S3 object, the Amazon S3
object can be copied to your Amazon Redshift cluster. For more information, see Amazon Redshift COPY
Command Data Format Parameters.
For data delivery to Amazon ES, Kinesis Data Firehose buffers incoming records based on the
buffering configuration of your delivery stream. It then generates an Elasticsearch bulk request to
index multiple records to your Elasticsearch cluster. Make sure that your record is UTF-8 encoded
and flattened to a single-line JSON object before you send it to Kinesis Data Firehose. Also, the
rest.action.multi.allow_explicit_index option for your Elasticsearch cluster must be set to
true (default) to take bulk requests with an explicit index that is set per record. For more information, see
Amazon ES Configure Advanced Options in the Amazon Elasticsearch Service Developer Guide.
For data delivery to Splunk, Kinesis Data Firehose concatenates the bytes that you send. If you want
delimiters in your data, such as a new line character, you must insert them yourself. Make sure that
Splunk is configured to parse any such delimiters.
53
Amazon Kinesis Data Firehose Developer Guide
Data Delivery Frequency
Amazon S3
The frequency of data delivery to Amazon S3 is determined by the Amazon S3 Buffer size and
Buffer interval value that you configured for your delivery stream. Kinesis Data Firehose buffers
incoming data before it delivers it to Amazon S3. You can configure the values for Amazon S3
Buffer size (1–128 MB) or Buffer interval (60–900 seconds). The condition satisfied first triggers
data delivery to Amazon S3. When data delivery to the destination falls behind data writing to the
delivery stream, Kinesis Data Firehose raises the buffer size dynamically. It can then catch up and
ensure that all data is delivered to the destination.
Amazon Redshift
The frequency of data COPY operations from Amazon S3 to Amazon Redshift is determined by how
fast your Amazon Redshift cluster can finish the COPY command. If there is still data to copy, Kinesis
Data Firehose issues a new COPY command as soon as the previous COPY command is successfully
finished by Amazon Redshift.
Amazon Elasticsearch Service
The frequency of data delivery to Amazon ES is determined by the Elasticsearch Buffer size and
Buffer interval values that you configured for your delivery stream. Kinesis Data Firehose buffers
incoming data before delivering it to Amazon ES. You can configure the values for Elasticsearch
Buffer size (1–100 MB) or Buffer interval (60–900 seconds), and the condition satisfied first triggers
data delivery to Amazon ES.
Splunk
Kinesis Data Firehose buffers incoming data before delivering it to Splunk. The buffer size is 5 MB,
and the buffer interval is 60 seconds. The condition satisfied first triggers data delivery to Splunk.
The buffer size and interval aren't configurable. These numbers are optimal.
Amazon S3
Data delivery to your S3 bucket might fail for various reasons. For example, the bucket might not
exist anymore, the IAM role that Kinesis Data Firehose assumes might not have access to the bucket,
the network failed, or similar events. Under these conditions, Kinesis Data Firehose keeps retrying
for up to 24 hours until the delivery succeeds. The maximum data storage time of Kinesis Data
Firehose is 24 hours. If data delivery fails for more than 24 hours, your data is lost.
Amazon Redshift
For an Amazon Redshift destination, you can specify a retry duration (0–7200 seconds) when
creating a delivery stream.
Data delivery to your Amazon Redshift cluster might fail for several reasons. For example, you might
have an incorrect cluster configuration of your delivery stream, a cluster under maintenance, or a
network failure. Under these conditions, Kinesis Data Firehose retries for the specified time duration
and skips that particular batch of Amazon S3 objects. The skipped objects' information is delivered
to your S3 bucket as a manifest file in the errors/ folder, which you can use for manual backfill.
For information about how to COPY data manually with manifest files, see Using a Manifest to
Specify Data Files.
54
Amazon Kinesis Data Firehose Developer Guide
Data Delivery Failure Handling
For the Amazon ES destination, you can specify a retry duration (0–7200 seconds) when creating a
delivery stream.
Data delivery to your Amazon ES cluster might fail for several reasons. For example, you might have
an incorrect Amazon ES cluster configuration of your delivery stream, an Amazon ES cluster under
maintenance, a network failure, or similar events. Under these conditions, Kinesis Data Firehose
retries for the specified time duration and then skips that particular index request. The skipped
documents are delivered to your S3 bucket in the elasticsearch_failed/ folder, which you can
use for manual backfill. Each document has the following JSON format:
{
"attemptsMade": "(number of index requests attempted)",
"arrivalTimestamp": "(the time when the document was received by Firehose)",
"errorCode": "(http error code returned by Elasticsearch)",
"errorMessage": "(error message returned by Elasticsearch)",
"attemptEndingTimestamp": "(the time when Firehose stopped attempting index
request)",
"esDocumentId": "(intended Elasticsearch document ID)",
"esIndexName": "(intended Elasticsearch index name)",
"esTypeName": "(intended Elasticsearch type name)",
"rawData": "(base64-encoded document data)"
}
Splunk
When Kinesis Data Firehose sends data to Splunk, it waits for an acknowledgment from Splunk. If
an error occurs, or the acknowledgment doesn’t arrive within the acknowledgment timeout period,
Kinesis Data Firehose starts the retry duration counter. It keeps retrying until the retry duration
expires. After that, Kinesis Data Firehose considers it a data delivery failure and backs up the data to
your Amazon S3 bucket.
Every time Kinesis Data Firehose sends data to Splunk, whether it's the initial attempt or a retry, it
restarts the acknowledgement timeout counter. It then waits for an acknowledgement to arrive from
Splunk. Even if the retry duration expires, Kinesis Data Firehose still waits for the acknowledgment
until it receives it or the acknowledgement timeout is reached. If the acknowledgment times out,
Kinesis Data Firehose checks to determine whether there's time left in the retry counter. If there is
time left, it retries again and repeats the logic until it receives an acknowledgment or determines
that the retry time has expired.
A failure to receive an acknowledgement isn't the only type of data delivery error that can occur. For
information about the other types of data delivery errors, see Splunk Data Delivery Errors. Any data
delivery error triggers the retry logic if your retry duration is greater than 0.
{
"attemptsMade": 0,
"arrivalTimestamp": 1506035354675,
"errorCode": "Splunk.AckTimeout",
"errorMessage": "Did not receive an acknowledgement from HEC before the HEC
acknowledgement timeout expired. Despite the acknowledgement timeout, it's possible
the data was indexed successfully in Splunk. Kinesis Firehose backs up in Amazon S3
data for which the acknowledgement timeout expired.",
"attemptEndingTimestamp": 13626284715507,
"rawData":
"MiAyNTE2MjAyNzIyMDkgZW5pLTA1ZjMyMmQ1IDIxOC45Mi4xODguMjE0IDE3Mi4xNi4xLjE2NyAyNTIzMyAxNDMzIDYgMSA0M
"EventId": "49577193928114147339600778471082492393164139877200035842.0"
}
55
Amazon Kinesis Data Firehose Developer Guide
Amazon S3 Object Name Format
Depending on the rotation option you choose, Kinesis Data Firehose appends a portion of the UTC
arrival timestamp to your specified index name. It rotates the appended timestamp accordingly. The
following example shows the resulting index name in Amazon ES for each index rotation option, where
the specified index name is myindex and the arrival timestamp is 2016-02-25T13:00:00Z.
RotationPeriod IndexName
NoRotation myindex
OneHour myindex-2016-02-25-13
OneDay myindex-2016-02-25
OneWeek myindex-2016-w08
OneMonth myindex-2016-02
56
Amazon Kinesis Data Firehose Developer Guide
Monitoring with CloudWatch Metrics
• Amazon CloudWatch metrics (p. 57)— Kinesis Data Firehose sends Amazon CloudWatch custom
metrics with detailed monitoring for each delivery stream.
• Amazon CloudWatch Logs (p. 70)— Kinesis Data Firehose sends CloudWatch custom logs with
detailed monitoring for each delivery stream.
• Kinesis Agent (p. 75)— Kinesis Agent publishes custom CloudWatch metrics to help assess whether
the agent is working as expected.
• API logging and history (p. 76)— Kinesis Data Firehose uses AWS CloudTrail to log API calls and
store the data in an Amazon S3 bucket, and to maintain API call history.
The metrics that you configure for your Kinesis Data Firehose delivery streams and agents are
automatically collected and pushed to CloudWatch every five minutes. Metrics are archived for two
weeks; after that period, the data is discarded.
The metrics collected for Kinesis Data Firehose delivery streams are free of charge. For information about
Kinesis agent metrics, see Monitoring Kinesis Agent Health (p. 75).
Topics
• Data Delivery CloudWatch Metrics (p. 58)
• Data Ingestion Metrics (p. 62)
• API-Level CloudWatch Metrics (p. 65)
• Data Transformation CloudWatch Metrics (p. 67)
• Format Conversion CloudWatch Metrics (p. 67)
• Server-Side Encryption (SSE) CloudWatch Metrics (p. 68)
• Dimensions for Kinesis Data Firehose (p. 68)
• Kinesis Data Firehose Usage Metrics (p. 68)
• Accessing CloudWatch Metrics for Kinesis Data Firehose (p. 69)
• Best Practices with CloudWatch Alarms (p. 69)
• Monitoring Kinesis Data Firehose Using CloudWatch Logs (p. 70)
• Monitoring Kinesis Agent Health (p. 75)
57
Amazon Kinesis Data Firehose Developer Guide
Data Delivery CloudWatch Metrics
• Logging Kinesis Data Firehose API Calls with AWS CloudTrail (p. 76)
Delivery to Amazon ES
Metric Description
Units: Bytes
Units: Seconds
Units: Count
DeliveryToElasticsearch.Success The sum of the successfully indexed records over the sum of
records that were attempted.
Units: Count
DeliveryToS3.DataFreshness The age (from getting into Kinesis Data Firehose to now)
of the oldest record in Kinesis Data Firehose. Any record
older than this age has been delivered to the S3 bucket.
Kinesis Data Firehose emits this metric only when you enable
backup for all documents.
Units: Seconds
Units: Count
58
Amazon Kinesis Data Firehose Developer Guide
Data Delivery CloudWatch Metrics
Metric Description
Units: Count
Units: Count
Units: Bytes
DeliveryToS3.DataFreshness The age (from getting into Kinesis Data Firehose to now) of
the oldest record in Kinesis Data Firehose. Any record older
than this age has been delivered to the S3 bucket.
Units: Seconds
Units: Count
Units: Count
BackupToS3.DataFreshness Age (from getting into Kinesis Data Firehose to now) of the
oldest record in Kinesis Data Firehose. Any record older than
this age has been delivered to the Amazon S3 bucket for
backup. Kinesis Data Firehose emits this metric when data
transformation and backup to Amazon S3 are enabled.
Units: Seconds
Units: Count
59
Amazon Kinesis Data Firehose Developer Guide
Data Delivery CloudWatch Metrics
Metric Description
Data Firehose emits this metric when data transformation
and backup to Amazon S3 are enabled.
Delivery to Amazon S3
The metrics in the following table are related to delivery to Amazon S3 when it is the main destination of
the delivery stream.
Metric Description
Units: Bytes
DeliveryToS3.DataFreshness The age (from getting into Kinesis Data Firehose to now) of
the oldest record in Kinesis Data Firehose. Any record older
than this age has been delivered to the S3 bucket.
Units: Seconds
Units: Count
Units: Count
BackupToS3.DataFreshness Age (from getting into Kinesis Data Firehose to now) of the
oldest record in Kinesis Data Firehose. Any record older than
this age has been delivered to the Amazon S3 bucket for
backup. Kinesis Data Firehose emits this metric when backup
is enabled (which is only possible when data transformation
is also enabled).
Units: Seconds
Units: Count
60
Amazon Kinesis Data Firehose Developer Guide
Data Delivery CloudWatch Metrics
Metric Description
(which is only possible when data transformation is also
enabled).
Delivery to Splunk
Metric Description
Units: Bytes
Units: Seconds
DeliveryToSplunk.DataFreshness Age (from getting into Kinesis Data Firehose to now) of the
oldest record in Kinesis Data Firehose. Any record older than
this age has been delivered to Splunk.
Units: Seconds
Units: Count
DeliveryToSplunk.Success The sum of the successfully indexed records over the sum of
records that were attempted.
Units: Count
BackupToS3.DataFreshness Age (from getting into Kinesis Data Firehose to now) of the
oldest record in Kinesis Data Firehose. Any record older than
this age has been delivered to the Amazon S3 bucket for
backup. Kinesis Data Firehose emits this metric when when
the delivery stream is configured to back up all documents.
Units: Seconds
61
Amazon Kinesis Data Firehose Developer Guide
Data Ingestion Metrics
Metric Description
this metric when when the delivery stream is configured to
back up all documents.
Units: Count
Metric Description
DataReadFromKinesisStream.Bytes When the data source is a Kinesis data stream, this metric
indicates the number of bytes read from that data stream.
This number includes rereads due to failovers.
Units: Bytes
Units: Count
Units: Count
Units: Count
Units: Count
Metric Description
62
Amazon Kinesis Data Firehose Developer Guide
Data Ingestion Metrics
Metric Description
Units: Bytes
Units: Seconds
Units: Count
Units: Bytes
DataReadFromKinesisStream.Bytes When the data source is a Kinesis data stream, this metric
indicates the number of bytes read from that data stream.
This number includes rereads due to failovers.
Units: Bytes
Units: Count
Units: Bytes
Units: Seconds
Units: Count
DeliveryToElasticsearch.Success The sum of the successfully indexed records over the sum of
records that were attempted.
63
Amazon Kinesis Data Firehose Developer Guide
Data Ingestion Metrics
Metric Description
Units: Bytes
Units: Count
Units: Bytes
DeliveryToS3.DataFreshness The age (from getting into Kinesis Data Firehose to now) of
the oldest record in Kinesis Data Firehose. Any record older
than this age has been delivered to the S3 bucket.
Units: Seconds
Units: Count
Units: Bytes
Units: Seconds
DeliveryToSplunk.DataFreshness Age (from getting into Kinesis Data Firehose to now) of the
oldest record in Kinesis Data Firehose. Any record older than
this age has been delivered to Splunk.
Units: Seconds
Units: Count
64
Amazon Kinesis Data Firehose Developer Guide
API-Level CloudWatch Metrics
Metric Description
DeliveryToSplunk.Success The sum of the successfully indexed records over the sum of
records that were attempted.
IncomingBytes The number of bytes ingested into the Kinesis Data Firehose
stream over the specified time period.
Units: Bytes
Units: Count
KinesisMillisBehindLatest When the data source is a Kinesis data stream, this metric
indicates the number of milliseconds that the last read
record is behind the newest record in the Kinesis data
stream.
Units: Millisecond
Units: Count
Units: Count
Metric Description
Units: Milliseconds
Units: Count
Units: Count
Units: Milliseconds
65
Amazon Kinesis Data Firehose Developer Guide
API-Level CloudWatch Metrics
Metric Description
Units: Count
Units: Bytes
Units: Milliseconds
Units: Count
Units: Bytes
Units: Milliseconds
Units: Count
Units: Count
Units: Count
Units: Count
Units: Count
Units: Count
66
Amazon Kinesis Data Firehose Developer Guide
Data Transformation CloudWatch Metrics
Metric Description
Units: Milliseconds
Units: Count
Metric Description
The time it takes for each Lambda function invocation performed by Kinesis
ExecuteProcessing.Duration
Data Firehose.
Units: Milliseconds
The sum of the successful Lambda function invocations over the sum of the
ExecuteProcessing.Success
total Lambda function invocations.
The number of successfully processed records over the specified time period.
SucceedProcessing.Records
Units: Count
The number of successfully processed bytes over the specified time period.
SucceedProcessing.Bytes
Units: Bytes
Metric Description
Units: Count
Units: Bytes
Units: Count
Units: Bytes
67
Amazon Kinesis Data Firehose Developer Guide
Server-Side Encryption (SSE) CloudWatch Metrics
Metric Description
Units: Count
Units: Count
Units: Count
Units: Count
Service quota usage metrics are in the AWS/Usage namespace and are collected every minute.
Currently, the only metric name in this namespace that CloudWatch publishes is ResourceCount. This
metric is published with the dimensions Service, Class, Type, and Resource.
Metric Description
The following dimensions are used to refine the usage metrics that are published by Kinesis Data
Firehose.
68
Amazon Kinesis Data Firehose Developer Guide
Accessing CloudWatch Metrics for Kinesis Data Firehose
Dimension Description
Resource The name of the AWS resource. Currently, when the Service
dimension is Firehose, the only valid value for Resource is
DeliveryStreams.
• DeliveryToS3.DataFreshness
• DeliveryToSplunk.DataFreshness
69
Amazon Kinesis Data Firehose Developer Guide
Monitoring with CloudWatch Logs
• DeliveryToElasticsearch.DataFreshness
For information about troubleshooting when alarms go to the ALARM state, see
Troubleshooting (p. 98).
If you enable Kinesis Data Firehose error logging in the Kinesis Data Firehose console, a log group and
corresponding log streams are created for the delivery stream on your behalf. The format of the log
group name is /aws/kinesisfirehose/delivery-stream-name, where delivery-stream-name
is the name of the corresponding delivery stream. The log stream name is S3Delivery, RedshiftDelivery,
or ElasticsearchDelivery, depending on the delivery destination. Lambda invocation errors for data
transformation are also logged to the log stream used for data delivery errors.
For example, if you create a delivery stream "MyStream" with Amazon Redshift as the destination and
enable Kinesis Data Firehose error logging, the following are created on your behalf: a log group named
aws/kinesisfirehose/MyStream and two log streams named S3Delivery and RedshiftDelivery.
In this example, the S3Delivery log stream is used for logging errors related to delivery failure to the
intermediate S3 bucket. The RedshiftDelivery log stream is used for logging errors related to Lambda
invocation failure and delivery failure to your Amazon Redshift cluster.
If you enable Kinesis Data Firehose error logging through the AWS CLI or an AWS SDK using the
CloudWatchLoggingOptions configuration, you must create a log group and a log stream in advance.
We recommend reserving that log group and log stream for Kinesis Data Firehose error logging
exclusively. Also ensure that the associated IAM policy has "logs:putLogEvents" permission. For more
information, see Controlling Access with Amazon Kinesis Data Firehose (p. 29).
Note that Kinesis Data Firehose does not guarantee that all delivery error logs are sent to CloudWatch
Logs. In circumstances where delivery failure rate is high, Kinesis Data Firehose samples delivery error
logs before sending them to CloudWatch Logs.
There is a nominal charge for error logs sent to CloudWatch Logs. For more information, see Amazon
CloudWatch Pricing.
Contents
• Data Delivery Errors (p. 71)
• Lambda Invocation Errors (p. 74)
• Accessing CloudWatch Logs for Kinesis Data Firehose (p. 75)
70
Amazon Kinesis Data Firehose Developer Guide
Monitoring with CloudWatch Logs
Errors
• Amazon S3 Data Delivery Errors (p. 71)
• Amazon Redshift Data Delivery Errors (p. 71)
• Splunk Data Delivery Errors (p. 73)
• Amazon Elasticsearch Service Data Delivery Errors (p. 74)
"The provided AWS KMS key was not found. If you are using what you
S3.KMS.NotFoundException
believe to be a valid AWS KMS key with the correct role, check if there is a
problem with the account to which the AWS KMS key is attached."
"The KMS request per second limit was exceeded while attempting to
S3.KMS.RequestLimitExceeded
encrypt S3 objects. Increase the request per second limit."
For more information, see Limits in the AWS Key Management Service
Developer Guide.
S3.AccessDenied "Access was denied. Ensure that the trust policy for the provided IAM role
allows Kinesis Data Firehose to assume the role, and the access policy allows
access to the S3 bucket."
S3.AccountProblem "There is a problem with your AWS account that prevents the operation from
completing successfully. Contact AWS Support."
S3.AllAccessDisabled"Access to the account provided has been disabled. Contact AWS Support."
S3.InvalidPayer "Access to the account provided has been disabled. Contact AWS Support."
S3.NotSignedUp "The account is not signed up for Amazon S3. Sign the account up or use a
different account."
S3.NoSuchBucket "The specified bucket does not exist. Create the bucket or use a different
bucket that does exist."
S3.MethodNotAllowed "The specified method is not allowed against this resource. Modify the
bucket’s policy to allow the correct Amazon S3 operation permissions."
InternalError "An internal error occurred while attempting to deliver data. Delivery will be
retried; if the error persists, then it will be reported to AWS for resolution."
71
Amazon Kinesis Data Firehose Developer Guide
Monitoring with CloudWatch Logs
"The table to which to load data was not found. Ensure that the specified
Redshift.TableNotFound
table exists."
"The provided user name and password failed authentication. Provide a valid
Redshift.AuthenticationFailed
user name and password."
"Access was denied. Ensure that the trust policy for the provided IAM role
Redshift.AccessDenied
allows Kinesis Data Firehose to assume the role."
"The COPY command was unable to access the S3 bucket. Ensure that the
Redshift.S3BucketAccessDenied
access policy for the provided IAM role allows access to the S3 bucket."
"Loading data into the table failed. Check STL_LOAD_ERRORS system table
Redshift.DataLoadFailed
for details."
"A column in the COPY command does not exist in the table. Specify a valid
Redshift.ColumnNotFound
column name."
For more information, see the Amazon Redshift COPY command in the
Amazon Redshift Database Developer Guide.
"The connection to the specified Amazon Redshift cluster failed. Ensure that
Redshift.ConnectionFailed
security settings allow Kinesis Data Firehose connections, that the cluster or
database specified in the Amazon Redshift destination configuration or JDBC
URL is correct, and that the cluster is available."
"Amazon Redshift attempted to use the wrong region endpoint for accessing
Redshift.IncorrectOrMissingRegion
the S3 bucket. Either specify a correct region value in the COPY command
options or ensure that the S3 bucket is in the same region as the Amazon
Redshift database."
"The provided jsonpaths file is not in a supported JSON format. Retry the
Redshift.IncorrectJsonPathsFile
command."
72
Amazon Kinesis Data Firehose Developer Guide
Monitoring with CloudWatch Logs
"The user does not have permissions to load data into the table. Check the
Redshift.InsufficientPrivilege
Amazon Redshift user permissions for the INSERT privilege."
"The query cannot be executed because the system is in resize mode. Try the
Redshift.ReadOnlyCluster
query again later."
Redshift.DiskFull "Data could not be loaded because the disk is full. Increase the capacity of
the Amazon Redshift cluster or delete unused data to free disk space."
InternalError "An internal error occurred while attempting to deliver data. Delivery will be
retried; if the error persists, then it will be reported to AWS for resolution."
"If you have a proxy (ELB or other) between Kinesis Data Firehose and the
Splunk.ProxyWithoutStickySessions
HEC node, you must enable sticky sessions to support HEC ACKs."
Splunk.DisabledToken"The HEC token is disabled. Enable the token to allow data delivery to
Splunk."
Splunk.InvalidToken "The HEC token is invalid. Update Kinesis Data Firehose with a valid HEC
token."
"The data is not formatted correctly. To see how to properly format data for
Splunk.InvalidDataFormat
Raw or Event HEC endpoints, see Splunk Event Data."
Splunk.InvalidIndex "The HEC token or input is configured with an invalid index. Check your
index configuration and try again."
Splunk.ServerError "Data delivery to Splunk failed due to a server error from the HEC node.
Kinesis Data Firehose will retry sending the data if the retry duration in your
Kinesis Data Firehose is greater than 0. If all the retries fail, Kinesis Data
Firehose backs up the data to Amazon S3."
Splunk.DisabledAck "Indexer acknowledgement is disabled for the HEC token. Enable indexer
acknowledgement and try again. For more info, see Enable indexer
acknowledgement."
Splunk.AckTimeout "Did not receive an acknowledgement from HEC before the HEC
acknowledgement timeout expired. Despite the acknowledgement timeout,
it's possible the data was indexed successfully in Splunk. Kinesis Data
Firehose backs up in Amazon S3 data for which the acknowledgement
timeout expired."
"The connection to Splunk timed out. This might be a transient error and the
Splunk.ConnectionTimeout
request will be retried. Kinesis Data Firehose backs up the data to Amazon
S3 if all retries fail."
73
Amazon Kinesis Data Firehose Developer Guide
Monitoring with CloudWatch Logs
"Could not connect to the HEC endpoint. Make sure that the HEC endpoint
Splunk.InvalidEndpoint
URL is valid and reachable from Kinesis Data Firehose."
Splunk.SSLUnverified"Could not connect to the HEC endpoint. The host does not match the
certificate provided by the peer. Make sure that the certificate and the host
are valid."
Splunk.SSLHandshake "Could not connect to the HEC endpoint. Make sure that the certificate and
the host are valid."
"Access was denied. Ensure that the trust policy for the provided IAM role
Lambda.AssumeRoleAccessDenied
allows Kinesis Data Firehose to assume the role."
"Access was denied. Ensure that the access policy allows access to the
Lambda.InvokeAccessDenied
Lambda function."
"There was an error parsing returned records from the Lambda function.
Lambda.JsonProcessingException
Ensure that the returned records follow the status model required by Kinesis
Data Firehose."
For more information, see Data Transformation and Status Model (p. 45).
For more information, see AWS Lambda Limits in the AWS Lambda Developer
Guide.
"Multiple records were returned with the same record ID. Ensure that the
Lambda.DuplicatedRecordId
Lambda function returns unique record IDs for each record."
For more information, see Data Transformation and Status Model (p. 45).
"One or more record IDs were not returned. Ensure that the Lambda function
Lambda.MissingRecordId
returns all received record IDs."
For more information, see Data Transformation and Status Model (p. 45).
"The specified Lambda function does not exist. Use a different function that
Lambda.ResourceNotFound
does exist."
74
Amazon Kinesis Data Firehose Developer Guide
Monitoring Agent Health
"AWS Lambda was not able to set up the VPC access for the Lambda
Lambda.SubnetIPAddressLimitReachedException
function because one or more configured subnets have no available IP
addresses. Increase the IP address limit."
For more information, see Amazon VPC Limits - VPC and Subnets in the
Amazon VPC User Guide.
"AWS Lambda was not able to create an Elastic Network Interface (ENI) in
Lambda.ENILimitReachedException
the VPC, specified as part of the Lambda function configuration, because
the limit for network interfaces has been reached. Increase the network
interface limit."
For more information, see Amazon VPC Limits - Network Interfaces in the
Amazon VPC User Guide.
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Firehose in the navigation pane.
3. On the navigation bar, choose an AWS Region.
4. Choose a delivery stream name to go to the delivery stream details page.
5. Choose Error Log to view a list of error logs related to data delivery failure.
Metrics such as number of records and bytes sent are useful to understand the rate at which the agent
is submitting data to the Kinesis Data Firehose delivery stream. When these metrics fall below expected
75
Amazon Kinesis Data Firehose Developer Guide
Logging Kinesis Data Firehose
API Calls with AWS CloudTrail
thresholds by some percentage or drop to zero, it could indicate configuration issues, network errors, or
agent health issues. Metrics such as on-host CPU and memory consumption and agent error counters
indicate data producer resource usage, and provide insights into potential configuration or host errors.
Finally, the agent also logs service exceptions to help investigate agent issues.
The agent metrics are reported in the region specified in the agent configuration setting
cloudwatch.endpoint. For more information, see Agent Configuration Settings (p. 18).
There is a nominal charge for metrics emitted from Kinesis Agent, which are enabled by default. For
more information, see Amazon CloudWatch Pricing.
Metric Description
BytesSent The number of bytes sent to the Kinesis Data Firehose delivery stream over
the specified time period.
Units: Bytes
RecordSendAttempts The number of records attempted (either first time, or as a retry) in a call to
PutRecordBatch over the specified time period.
Units: Count
Units: Count
Units: Count
To learn more about CloudTrail, including how to configure and enable it, see the AWS CloudTrail User
Guide.
76
Amazon Kinesis Data Firehose Developer Guide
Logging Kinesis Data Firehose
API Calls with AWS CloudTrail
For an ongoing record of events in your AWS account, including events for Kinesis Data Firehose, create
a trail. A trail enables CloudTrail to deliver log files to an Amazon S3 bucket. By default, when you create
a trail in the console, the trail applies to all AWS Regions. The trail logs events from all Regions in the
AWS partition and delivers the log files to the Amazon S3 bucket that you specify. Additionally, you can
configure other AWS services to further analyze and act upon the event data collected in CloudTrail logs.
For more information, see the following:
Kinesis Data Firehose supports logging the following actions as events in CloudTrail log files:
• CreateDeliveryStream
• DeleteDeliveryStream
• DescribeDeliveryStream
• ListDeliveryStreams
• ListTagsForDeliveryStream
• TagDeliveryStream
• StartDeliveryStreamEncryption
• StopDeliveryStreamEncryption
• UntagDeliveryStream
• UpdateDestination
Every event or log entry contains information about who generated the request. The identity
information helps you determine the following:
• Whether the request was made with root or AWS Identity and Access Management (IAM) user
credentials.
• Whether the request was made with temporary security credentials for a role or federated user.
• Whether the request was made by another AWS service.
77
Amazon Kinesis Data Firehose Developer Guide
Logging Kinesis Data Firehose
API Calls with AWS CloudTrail
The following example shows a CloudTrail log entry that demonstrates the CreateDeliveryStream,
DescribeDeliveryStream, ListDeliveryStreams, UpdateDestination, and
DeleteDeliveryStream actions.
{
"Records":[
{
"eventVersion":"1.02",
"userIdentity":{
"type":"IAMUser",
"principalId":"AKIAIOSFODNN7EXAMPLE",
"arn":"arn:aws:iam::111122223333:user/CloudTrail_Test_User",
"accountId":"111122223333",
"accessKeyId":"AKIAI44QH8DHBEXAMPLE",
"userName":"CloudTrail_Test_User"
},
"eventTime":"2016-02-24T18:08:22Z",
"eventSource":"firehose.amazonaws.com",
"eventName":"CreateDeliveryStream",
"awsRegion":"us-east-1",
"sourceIPAddress":"127.0.0.1",
"userAgent":"aws-internal/3",
"requestParameters":{
"deliveryStreamName":"TestRedshiftStream",
"redshiftDestinationConfiguration":{
"s3Configuration":{
"compressionFormat":"GZIP",
"prefix":"prefix",
"bucketARN":"arn:aws:s3:::firehose-cloudtrail-test-bucket",
"roleARN":"arn:aws:iam::111122223333:role/Firehose",
"bufferingHints":{
"sizeInMBs":3,
"intervalInSeconds":900
},
"encryptionConfiguration":{
"kMSEncryptionConfig":{
"aWSKMSKeyARN":"arn:aws:kms:us-east-1:key"
}
}
},
"clusterJDBCURL":"jdbc:redshift://example.abc123.us-
west-2.redshift.amazonaws.com:5439/dev",
"copyCommand":{
"copyOptions":"copyOptions",
"dataTableName":"dataTable"
},
"password":"",
"username":"",
"roleARN":"arn:aws:iam::111122223333:role/Firehose"
}
},
"responseElements":{
"deliveryStreamARN":"arn:aws:firehose:us-east-1:111122223333:deliverystream/
TestRedshiftStream"
},
"requestID":"958abf6a-db21-11e5-bb88-91ae9617edf5",
"eventID":"875d2d68-476c-4ad5-bbc6-d02872cfc884",
"eventType":"AwsApiCall",
"recipientAccountId":"111122223333"
},
{
"eventVersion":"1.02",
"userIdentity":{
"type":"IAMUser",
"principalId":"AKIAIOSFODNN7EXAMPLE",
78
Amazon Kinesis Data Firehose Developer Guide
Logging Kinesis Data Firehose
API Calls with AWS CloudTrail
"arn":"arn:aws:iam::111122223333:user/CloudTrail_Test_User",
"accountId":"111122223333",
"accessKeyId":"AKIAI44QH8DHBEXAMPLE",
"userName":"CloudTrail_Test_User"
},
"eventTime":"2016-02-24T18:08:54Z",
"eventSource":"firehose.amazonaws.com",
"eventName":"DescribeDeliveryStream",
"awsRegion":"us-east-1",
"sourceIPAddress":"127.0.0.1",
"userAgent":"aws-internal/3",
"requestParameters":{
"deliveryStreamName":"TestRedshiftStream"
},
"responseElements":null,
"requestID":"aa6ea5ed-db21-11e5-bb88-91ae9617edf5",
"eventID":"d9b285d8-d690-4d5c-b9fe-d1ad5ab03f14",
"eventType":"AwsApiCall",
"recipientAccountId":"111122223333"
},
{
"eventVersion":"1.02",
"userIdentity":{
"type":"IAMUser",
"principalId":"AKIAIOSFODNN7EXAMPLE",
"arn":"arn:aws:iam::111122223333:user/CloudTrail_Test_User",
"accountId":"111122223333",
"accessKeyId":"AKIAI44QH8DHBEXAMPLE",
"userName":"CloudTrail_Test_User"
},
"eventTime":"2016-02-24T18:10:00Z",
"eventSource":"firehose.amazonaws.com",
"eventName":"ListDeliveryStreams",
"awsRegion":"us-east-1",
"sourceIPAddress":"127.0.0.1",
"userAgent":"aws-internal/3",
"requestParameters":{
"limit":10
},
"responseElements":null,
"requestID":"d1bf7f86-db21-11e5-bb88-91ae9617edf5",
"eventID":"67f63c74-4335-48c0-9004-4ba35ce00128",
"eventType":"AwsApiCall",
"recipientAccountId":"111122223333"
},
{
"eventVersion":"1.02",
"userIdentity":{
"type":"IAMUser",
"principalId":"AKIAIOSFODNN7EXAMPLE",
"arn":"arn:aws:iam::111122223333:user/CloudTrail_Test_User",
"accountId":"111122223333",
"accessKeyId":"AKIAI44QH8DHBEXAMPLE",
"userName":"CloudTrail_Test_User"
},
"eventTime":"2016-02-24T18:10:09Z",
"eventSource":"firehose.amazonaws.com",
"eventName":"UpdateDestination",
"awsRegion":"us-east-1",
"sourceIPAddress":"127.0.0.1",
"userAgent":"aws-internal/3",
"requestParameters":{
"destinationId":"destinationId-000000000001",
"deliveryStreamName":"TestRedshiftStream",
"currentDeliveryStreamVersionId":"1",
"redshiftDestinationUpdate":{
79
Amazon Kinesis Data Firehose Developer Guide
Logging Kinesis Data Firehose
API Calls with AWS CloudTrail
"roleARN":"arn:aws:iam::111122223333:role/Firehose",
"clusterJDBCURL":"jdbc:redshift://example.abc123.us-
west-2.redshift.amazonaws.com:5439/dev",
"password":"",
"username":"",
"copyCommand":{
"copyOptions":"copyOptions",
"dataTableName":"dataTable"
},
"s3Update":{
"bucketARN":"arn:aws:s3:::firehose-cloudtrail-test-bucket-update",
"roleARN":"arn:aws:iam::111122223333:role/Firehose",
"compressionFormat":"GZIP",
"bufferingHints":{
"sizeInMBs":3,
"intervalInSeconds":900
},
"encryptionConfiguration":{
"kMSEncryptionConfig":{
"aWSKMSKeyARN":"arn:aws:kms:us-east-1:key"
}
},
"prefix":"arn:aws:s3:::firehose-cloudtrail-test-bucket"
}
}
},
"responseElements":null,
"requestID":"d549428d-db21-11e5-bb88-91ae9617edf5",
"eventID":"1cb21e0b-416a-415d-bbf9-769b152a6585",
"eventType":"AwsApiCall",
"recipientAccountId":"111122223333"
},
{
"eventVersion":"1.02",
"userIdentity":{
"type":"IAMUser",
"principalId":"AKIAIOSFODNN7EXAMPLE",
"arn":"arn:aws:iam::111122223333:user/CloudTrail_Test_User",
"accountId":"111122223333",
"accessKeyId":"AKIAI44QH8DHBEXAMPLE",
"userName":"CloudTrail_Test_User"
},
"eventTime":"2016-02-24T18:10:12Z",
"eventSource":"firehose.amazonaws.com",
"eventName":"DeleteDeliveryStream",
"awsRegion":"us-east-1",
"sourceIPAddress":"127.0.0.1",
"userAgent":"aws-internal/3",
"requestParameters":{
"deliveryStreamName":"TestRedshiftStream"
},
"responseElements":null,
"requestID":"d85968c1-db21-11e5-bb88-91ae9617edf5",
"eventID":"dd46bb98-b4e9-42ff-a6af-32d57e636ad1",
"eventType":"AwsApiCall",
"recipientAccountId":"111122223333"
}
]
}
80
Amazon Kinesis Data Firehose Developer Guide
The timestamp namespace
You can use expressions of the following forms in your custom prefix: !{namespace:value}, where
namespace can be either firehose or timestamp, as explained in the following sections.
If a prefix ends with a slash, it appears as a folder in the Amazon S3 bucket. For more information, see
Amazon S3 Object Name Format in the Amazon Kinesis Data Firehose Developer Guide.
When evaluating timestamps, Kinesis Data Firehose uses the approximate arrival timestamp of the oldest
record that's contained in the Amazon S3 object being written.
If you use the timestamp namespace more than once in the same prefix expression, every instance
evaluates to the same instant in time.
81
Amazon Kinesis Data Firehose Developer Guide
Semantic rules
Semantic rules
The following rules apply to Prefix and ErrorOutputPrefix expressions.
• For the timestamp namespace, any character that isn't in single quotes is evaluated. In other words,
any string escaped with single quotes in the value field is taken literally.
• If you specify a prefix that doesn't contain a timestamp namespace expression, Kinesis Data Firehose
appends the expression !{timestamp:yyyy/MM/dd/HH/}to the value in the Prefix field.
• The sequence !{ can only appear in !{namespace:value} expressions.
• ErrorOutputPrefix can be null only if Prefix contains no expressions. In this case, Prefix
evaluates to <specified-prefix>YYYY/MM/DDD/HH/ and ErrorOutputPrefix evaluates to
<specified-prefix><error-output-type>YYYY/MM/DDD/HH/. DDD represents the day of the
year.
• If you specify an expression for ErrorOutputPrefix, you must include at least one instance of !
{firehose:error-output-type}.
• Prefix can't contain !{firehose:error-output-type}.
• Neither Prefix nor ErrorOutputPrefix can be greater than 512 characters after they're evaluated.
• If the destination is Amazon Redshift, Prefix must not contain expressions and
ErrorOutputPrefix must be null.
• When the destination is Amazon Elasticsearch Service or Splunk, and no ErrorOutputPrefix is
specified, Kinesis Data Firehose uses the Prefix field for failed records.
• When the destination is Amazon S3, and you specify an Amazon S3 backup configuration, the Prefix
and ErrorOutputPrefix in the Amazon S3 destination configuration are used for successful records
and failed records, respectively. In the Amazon S3 backup configuration, the Prefix is used for
backing up the raw data, whereas the ErrorOutputPrefix is ignored.
82
Amazon Kinesis Data Firehose Developer Guide
Example prefixes
Example prefixes
Prefix and ErrorOutputPrefix examples
83
Amazon Kinesis Data Firehose Developer Guide
Interface VPC endpoints (AWS
PrivateLink) for Kinesis Data Firehose
The following example shows how you can set up an AWS Lambda function in a VPC and create a VPC
endpoint to allow the function to communicate securely with the Kinesis Data Firehose service. In this
example, you use a policy that allows the Lambda function to list the delivery streams in the current
Region but not to describe any delivery stream.
1. Sign in to the AWS Management Console and open the Amazon VPC console at https://
console.aws.amazon.com/vpc/.
2. In the VPC Dashboard choose Endpoints.
3. Choose Create Endpoint.
4. In the list of service names, choose com.amazonaws.your_region.kinesis-firehose.
5. Choose the VPC and one or more subnets in which to create the endpoint.
6. Choose one or more security groups to associate with the endpoint.
7. For Policy, choose Custom and paste the following policy:
{
"Statement": [
{
"Sid": "Allow-only-specific-PrivateAPIs",
"Principal": "*",
"Action": [
"firehose:ListDeliveryStreams"
84
Amazon Kinesis Data Firehose Developer Guide
Using interface VPC endpoints (AWS
PrivateLink) for Kinesis Data Firehose
],
"Effect": "Allow",
"Resource": [
"*"
]
},
{
"Sid": "Allow-only-specific-PrivateAPIs",
"Principal": "*",
"Action": [
"firehose:DescribeDeliveryStream"
],
"Effect": "Deny",
"Resource": [
"*"
]
}
]
}
import json
import boto3
import os
from botocore.exceptions import ClientError
85
Amazon Kinesis Data Firehose Developer Guide
Availability
REGION = os.environ['AWS_REGION']
client = boto3.client(
'firehose',
REGION
)
print("Calling list_delivery_streams with ListDeliveryStreams allowed policy.")
delivery_stream_request = client.list_delivery_streams()
print("Successfully returned list_delivery_streams request %s." % (
delivery_stream_request
))
describe_access_denied = False
try:
print("Calling describe_delivery_stream with DescribeDeliveryStream denied
policy.")
delivery_stream_info =
client.describe_delivery_stream(DeliveryStreamName='test-describe-denied')
except ClientError as e:
error_code = e.response['Error']['Code']
print ("Caught %s." % (error_code))
if error_code == 'AccessDeniedException':
describe_access_denied = True
if not describe_access_denied:
raise
else:
print("Access denied test succeeded.")
Calling describe_delivery_stream.
AccessDeniedException
Availability
Interface VPC endpoints are currently supported within the following Regions:
• US East (Ohio)
• US East (N. Virginia)
• US West (N. California)
• US West (Oregon)
• Asia Pacific (Mumbai)
• Asia Pacific (Seoul)
• Asia Pacific (Singapore)
• Asia Pacific (Sydney)
86
Amazon Kinesis Data Firehose Developer Guide
Availability
87
Amazon Kinesis Data Firehose Developer Guide
Tag Basics
Topics
• Tag Basics (p. 88)
• Tracking Costs Using Tagging (p. 88)
• Tag Restrictions (p. 89)
• Tagging Delivery Streams Using the Amazon Kinesis Data Firehose API (p. 89)
Tag Basics
You can use the Amazon Kinesis Data Firehose API to complete the following tasks:
You can use tags to categorize your Kinesis Data Firehose delivery streams. For example, you can
categorize delivery streams by purpose, owner, or environment. Because you define the key and value for
each tag, you can create a custom set of categories to meet your specific needs. For example, you might
define a set of tags that helps you track delivery streams by owner and associated application.
88
Amazon Kinesis Data Firehose Developer Guide
Tag Restrictions
Tag Restrictions
The following restrictions apply to tags in Kinesis Data Firehose.
Basic restrictions
• Each tag key must be unique. If you add a tag with a key that's already in use, your new tag overwrites
the existing key-value pair.
• You can't start a tag key with aws: because this prefix is reserved for use by AWS. AWS creates tags
that begin with this prefix on your behalf, but you can't edit or delete them.
• Tag keys must be between 1 and 128 Unicode characters in length.
• Tag keys must consist of the following characters: Unicode letters, digits, white space, and the
following special characters: _ . / = + - @.
• TagDeliveryStream
• ListTagsForDeliveryStream
• UntagDeliveryStream
89
Amazon Kinesis Data Firehose Developer Guide
The following diagram shows the flow of data that is demonstrated in this tutorial.
As the diagram shows, first you send the Amazon VPC flow logs to Amazon CloudWatch. Then from
CloudWatch, the data goes to a Kinesis Data Firehose delivery stream. Kinesis Data Firehose then invokes
an AWS Lambda function to decompress the data, and sends the decompressed log data to Splunk.
Prerequisites
Before you begin, ensure that you have the following prerequisites:
• AWS account — If you don't have an AWS account, create one at http://aws.amazon.com. For more
information, see Setting Up for Amazon Kinesis Data Firehose (p. 4).
• AWS CLI — Parts of this tutorial require that you use the AWS Command Line Interface (AWS CLI).
To install the AWS CLI, see Installing the AWS Command Line Interface in the AWS Command Line
Interface User Guide.
• HEC token — In your Splunk deployment, set up an HTTP Event Collector (HEC) token with the source
type aws:cloudwatchlogs:vpcflow. For more information, see Installation and configuration
overview for the Splunk Add-on for Amazon Kinesis Firehose in the Splunk documentation.
Topics
90
Amazon Kinesis Data Firehose Developer Guide
Step 1: Send Log Data to CloudWatch
• Step 1: Send Log Data from Amazon VPC to Amazon CloudWatch (p. 91)
• Step 2: Create a Kinesis Data Firehose Delivery Stream with Splunk as a Destination (p. 93)
• Step 3: Send the Data from Amazon CloudWatch to Kinesis Data Firehose (p. 96)
• Step 4: Check the Results in Splunk and in Kinesis Data Firehose (p. 97)
To create a CloudWatch log group to receive your Amazon VPC flow logs
1. Sign in to the AWS Management Console and open the CloudWatch console at https://
console.aws.amazon.com/cloudwatch/.
2. In the navigation pane, choose Log groups.
3. Choose Actions, and then choose Create log group.
4. Enter the name VPCtoSplunkLogGroup, and choose Create log group.
91
Amazon Kinesis Data Firehose Developer Guide
Step 1: Send Log Data to CloudWatch
8. In the new window that appears, keep IAM Role set to Create a new IAM Role. In the Role Name
box, enter VPCtoSplunkWritetoCWRole. Then choose Allow.
92
Amazon Kinesis Data Firehose Developer Guide
Step 2: Create the Delivery Stream
9. Return to the Create flow log browser tab, and refresh the IAM role* box. Then choose
VPCtoSplunkWritetoCWRole in the list.
10. Choose Create, and then choose Close.
11. Back on the Amazon VPC dashboard, choose Your VPCs in the navigation pane. Then select the
check box next to your VPC.
12. Scroll down and choose the Flow Logs tab, and look for the flow log that you created in the
preceding steps. Ensure that its status is Active. If it is not, review the previous steps.
Proceed to Step 2: Create a Kinesis Data Firehose Delivery Stream with Splunk as a Destination (p. 93).
The logs that CloudWatch sends to the delivery stream are in a compressed format. However, Kinesis
Data Firehose can't send compressed logs to Splunk. Therefore, when you create the delivery stream
in the following procedure, you enable data transformation and configure an AWS Lambda function to
uncompress the log data. Kinesis Data Firehose then sends the uncompressed data to Splunk.
93
Amazon Kinesis Data Firehose Developer Guide
Step 2: Create the Delivery Stream
7. On the AWS Lambda console, for the function name, enter VPCtoSplunkLambda.
8. In the description text under Execution role, choose the IAM console link to create a custom role.
This opens the AWS Identity and Access Management (IAM) console.
9. In the IAM console, choose Lambda.
10. Choose Next: Permissions.
11. Choose Create policy.
12. Choose the JSON tab and replace the existing JSON with the following. Be sure to replace the your-
region and your-aws-account-id placeholders with your AWS Region code and account ID.
Don't include any hyphens or dashes in the account ID. For a list of AWS Region codes, see AWS
Regions and Endpoints.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
94
Amazon Kinesis Data Firehose Developer Guide
Step 2: Create the Delivery Stream
"Action": [
"logs:GetLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"firehose:PutRecordBatch"
],
"Resource": [
"arn:aws:firehose:your-region:your-aws-account-id:deliverystream/
VPCtoSplunkStream"
]
}
]
}
This policy allows the Lambda function to put data back into the delivery stream by invoking the
PutRecordBatch operation. This step is needed because a Lambda function can only return up to 6
MiB of data every time Kinesis Data Firehose invokes it. If the size of the uncompressed data exceeds
6 MiB, the function invokes PutRecordBatch to put some of the data back into the delivery stream
for future processing.
13. Back in the Create role window, refresh the list of policies, then choose VPCtoSplunkLambdaPolicy
by selecting the box to its left.
14. Choose Next: Tags.
15. Choose Next: Review.
16. For Role Name, enter VPCtoSplunkLambdaRole, then choose Create role.
17. Back in the Lambda console, refresh the list of existing roles, then select
VPCtoSplunkLambdaRole.
18. Scroll down and choose Create function.
19. In the Lambda function pane, scroll down to the Basic settings section, and increase the timeout to
3 minutes.
20. Scroll up and choose Save.
21. Back in the Choose Lambda blueprint dialog box, choose Close.
22. On the delivery stream creation page, under the Transform source records with AWS Lambda
section, choose the refresh button. Then choose VPCtoSplunkLambda in the list of functions.
23. Scroll down and choose Next.
24. For Destination*, choose Splunk.
25. For Splunk cluster endpoint, see the information at Configure Amazon Kinesis Firehose to send
data to the Splunk platform in the Splunk documentation.
26. Keep Splunk endpoint type set to Raw endpoint.
27. Enter the value (and not the name) of your Splunk HTTP Event Collector (HEC) token.
28. For S3 backup mode*, choose Backup all events.
29. Choose an existing Amazon S3 bucket (or create a new one if you want), and choose Next.
30. On the Configure settings page, scroll down to the IAM role section, and choose Create new or
choose.
31. In the IAM role list, choose Create a new IAM role. For Role Name, enter
VPCtoSplunkLambdaFirehoseRole, and then choose Allow.
32. Choose Next, and review the configuration that you chose for the delivery stream. Then choose
Create delivery stream.
Proceed to Step 3: Send the Data from Amazon CloudWatch to Kinesis Data Firehose (p. 96).
95
Amazon Kinesis Data Firehose Developer Guide
Step 3: Send Data to the Delivery Stream
In this procedure, you use the AWS Command Line Interface (AWS CLI) to create a CloudWatch Logs
subscription that sends log events to your delivery stream.
1. Save the following trust policy to a local file, and name the file
VPCtoSplunkCWtoFHTrustPolicy.json. Be sure to replace the your-region placeholder with
your AWS Region code.
{
"Statement": {
"Effect": "Allow",
"Principal": { "Service": "logs.your-region.amazonaws.com" },
"Action": "sts:AssumeRole"
}
}
3. Save the following access policy to a local file, and name the file
VPCtoSplunkCWtoFHAccessPolicy.json. Be sure to replace the your-region and your-aws-
account-id placeholders with your AWS Region code and account ID.
{
"Statement":[
{
"Effect":"Allow",
"Action":["firehose:*"],
"Resource":["arn:aws:firehose:your-region:your-aws-account-id:deliverystream/
VPCtoSplunkStream"]
},
{
"Effect":"Allow",
"Action":["iam:PassRole"],
"Resource":["arn:aws:iam::your-aws-account-id:role/VPCtoSplunkCWtoFHRole"]
}
]
}
96
Amazon Kinesis Data Firehose Developer Guide
Step 4: Check the Results
5. Replace the your-region and your-aws-account-id placeholders in the following AWS CLI
command with your AWS Region code and account ID, and then run the command.
Proceed to Step 4: Check the Results in Splunk and in Kinesis Data Firehose (p. 97).
Important
After you verify your results, delete any AWS resources that you don't need to keep, so as not to
incur ongoing charges.
97
Amazon Kinesis Data Firehose Developer Guide
Data Not Delivered to Amazon S3
If the delivery stream uses DirectPut, check the IncomingBytes and IncomingRecords metrics
to see if there's incoming traffic. If you are using the PutRecord or PutRecordBatch, make sure you
catch exceptions and retry. We recommend a retry policy with exponential back-off with jitter and
several retries. Also, if you use the PutRecordBatch API, make sure your code checks the value of
FailedPutCount in the response even when the API call succeeds.
If the delivery stream uses a Kinesis data stream as its source, check the IncomingBytes
and IncomingRecords metrics for the source data stream. Additionally, ensure that the
DataReadFromKinesisStream.Bytes and DataReadFromKinesisStream.Records metrics are
being emitted for the delivery stream.
For information about tracking delivery errors using CloudWatch, see the section called “Monitoring with
CloudWatch Logs” (p. 70).
Issues
• Data Not Delivered to Amazon S3 (p. 98)
• Data Not Delivered to Amazon Redshift (p. 99)
• Data Not Delivered to Amazon Elasticsearch Service (p. 100)
• Data Not Delivered to Splunk (p. 100)
• Delivery Stream Not Available as a Target for CloudWatch Logs, CloudWatch Events, or AWS IoT
Action (p. 101)
• Data Freshness Metric Increasing or Not Emitted (p. 101)
• Record Format Conversion to Apache Parquet Fails (p. 102)
• No Data at Destination Despite Good Metrics (p. 103)
• Check the Kinesis Data Firehose IncomingBytes and IncomingRecords metrics to make sure that
data is sent to your Kinesis Data Firehose delivery stream successfully. For more information, see
Monitoring Kinesis Data Firehose Using CloudWatch Metrics (p. 57).
• If data transformation with Lambda is enabled, check the Kinesis Data Firehose
ExecuteProcessingSuccess metric to make sure that Kinesis Data Firehose has tried to invoke
98
Amazon Kinesis Data Firehose Developer Guide
Data Not Delivered to Amazon Redshift
your Lambda function. For more information, see Monitoring Kinesis Data Firehose Using CloudWatch
Metrics (p. 57).
• Check the Kinesis Data Firehose DeliveryToS3.Success metric to make sure that Kinesis Data
Firehose has tried putting data to your Amazon S3 bucket. For more information, see Monitoring
Kinesis Data Firehose Using CloudWatch Metrics (p. 57).
• Enable error logging if it is not already enabled, and check error logs for delivery failure. For more
information, see Monitoring Kinesis Data Firehose Using CloudWatch Logs (p. 70).
• Make sure that the Amazon S3 bucket that is specified in your Kinesis Data Firehose delivery stream
still exists.
• If data transformation with Lambda is enabled, make sure that the Lambda function that is specified in
your delivery stream still exists.
• Make sure that the IAM role that is specified in your Kinesis Data Firehose delivery stream has access to
your S3 bucket and your Lambda function (if data transformation is enabled). For more information,
see Grant Kinesis Data Firehose Access to an Amazon S3 Destination (p. 31).
• If you're using data transformation, make sure that your Lambda function never returns responses
whose payload size exceeds 6 MB. For more information, see Amazon Kinesis Data Firehose Data
Transformation.
Data is delivered to your S3 bucket before loading into Amazon Redshift. If the data was not delivered to
your S3 bucket, see Data Not Delivered to Amazon S3 (p. 98).
• Check the Kinesis Data Firehose DeliveryToRedshift.Success metric to make sure that Kinesis
Data Firehose has tried to copy data from your S3 bucket to the Amazon Redshift cluster. For more
information, see Monitoring Kinesis Data Firehose Using CloudWatch Metrics (p. 57).
• Enable error logging if it is not already enabled, and check error logs for delivery failure. For more
information, see Monitoring Kinesis Data Firehose Using CloudWatch Logs (p. 70).
• Check the Amazon Redshift STL_CONNECTION_LOG table to see if Kinesis Data Firehose can make
successful connections. In this table, you should be able to see connections and their status based
on a user name. For more information, see STL_CONNECTION_LOG in the Amazon Redshift Database
Developer Guide.
• If the previous check shows that connections are being established, check the Amazon Redshift
STL_LOAD_ERRORS table to verify the reason for the COPY failure. For more information, see
STL_LOAD_ERRORS in the Amazon Redshift Database Developer Guide.
• Make sure that the Amazon Redshift configuration in your Kinesis Data Firehose delivery stream is
accurate and valid.
• Make sure that the IAM role that is specified in your Kinesis Data Firehose delivery stream can
access the S3 bucket that Amazon Redshift copies data from, and also the Lambda function for data
transformation (if data transformation is enabled). For more information, see Grant Kinesis Data
Firehose Access to an Amazon S3 Destination (p. 31).
• If your Amazon Redshift cluster is in a virtual private cloud (VPC), make sure that the cluster allows
access from Kinesis Data Firehose IP addresses. For more information, see Grant Kinesis Data Firehose
Access to an Amazon Redshift Destination (p. 32).
• Make sure that the Amazon Redshift cluster is publicly available.
• If you're using data transformation, make sure that your Lambda function never returns responses
whose payload size exceeds 6 MB. For more information, see Amazon Kinesis Data Firehose Data
Transformation.
99
Amazon Kinesis Data Firehose Developer Guide
Data Not Delivered to Amazon Elasticsearch Service
Data can be backed up to your Amazon S3 bucket concurrently. If data was not delivered to your S3
bucket, see Data Not Delivered to Amazon S3 (p. 98).
• Check the Kinesis Data Firehose IncomingBytes and IncomingRecords metrics to make sure that
data is sent to your Kinesis Data Firehose delivery stream successfully. For more information, see
Monitoring Kinesis Data Firehose Using CloudWatch Metrics (p. 57).
• If data transformation with Lambda is enabled, check the Kinesis Data Firehose
ExecuteProcessingSuccess metric to make sure that Kinesis Data Firehose has tried to invoke
your Lambda function. For more information, see Monitoring Kinesis Data Firehose Using CloudWatch
Metrics (p. 57).
• Check the Kinesis Data Firehose DeliveryToElasticsearch.Success metric to make sure that
Kinesis Data Firehose has tried to index data to the Amazon ES cluster. For more information, see
Monitoring Kinesis Data Firehose Using CloudWatch Metrics (p. 57).
• Enable error logging if it is not already enabled, and check error logs for delivery failure. For more
information, see Monitoring Kinesis Data Firehose Using CloudWatch Logs (p. 70).
• Make sure that the Amazon ES configuration in your delivery stream is accurate and valid.
• If data transformation with Lambda is enabled, make sure that the Lambda function that is specified in
your delivery stream still exists.
• Make sure that the IAM role that is specified in your delivery stream can access your Amazon ES cluster
and Lambda function (if data transformation is enabled). For more information, see Grant Kinesis Data
Firehose Access to an Amazon ES Destination (p. 34).
• If you're using data transformation, make sure that your Lambda function never returns responses
whose payload size exceeds 6 MB. For more information, see Amazon Kinesis Data Firehose Data
Transformation.
• If your Splunk platform is in a VPC, make sure that Kinesis Data Firehose can access it. For more
information, see Access to Splunk in VPC.
• If you use an AWS load balancer, make sure that it is a Classic Load Balancer. Kinesis Data Firehose
does not support Application Load Balancers or Network Load Balancers. Also, enable duration-based
sticky sessions with cookie expiration disabled. For information about how to do this, see Duration-
Based Session Stickiness.
• Review the Splunk platform requirements. The Splunk add-on for Kinesis Data Firehose requires
Splunk platform version 6.6.X or later. For more information, see Splunk Add-on for Amazon Kinesis
Firehose.
• If you have a proxy (Elastic Load Balancing or other) between Kinesis Data Firehose and the HTTP
Event Collector (HEC) node, enable sticky sessions to support HEC acknowledgements (ACKs).
• Make sure that you are using a valid HEC token.
• Ensure that the HEC token is enabled. See Enable and disable Event Collector tokens.
• Check whether the data that you're sending to Splunk is formatted correctly. For more information,
see Format events for HTTP Event Collector.
• Make sure that the HEC token and input event are configured with a valid index.
100
Amazon Kinesis Data Firehose Developer Guide
Delivery Stream Not Available as a Target for
CloudWatch Logs, CloudWatch Events, or AWS IoT Action
• When an upload to Splunk fails due to a server error from the HEC node, the request is automatically
retried. If all retries fail, the data gets backed up to Amazon S3. Check if your data appears in Amazon
S3, which is an indication of such a failure.
• Make sure that you enabled indexer acknowledgment on your HEC token. For more information, see
Enable indexer acknowledgement.
• Increase the value of HECAcknowledgmentTimeoutInSeconds in the Splunk destination
configuration of your Kinesis Data Firehose delivery stream.
• Increase the value of DurationInSeconds under RetryOptions in the Splunk destination
configuration of your Kinesis Data Firehose delivery stream.
• Check your HEC health.
• If you're using data transformation, make sure that your Lambda function never returns responses
whose payload size exceeds 6 MB. For more information, see Amazon Kinesis Data Firehose Data
Transformation.
• Make sure that the Splunk parameter named ackIdleCleanup is set to true. It is false by default. To
set this parameter to true, do the following:
• For a managed Splunk Cloud deployment, submit a case using the Splunk support portal. In this
case, ask Splunk support to enable the HTTP event collector, set ackIdleCleanup to true in
inputs.conf, and create or modify a load balancer to use with this add-on.
• For a distributed Splunk Enterprise deployment, set the ackIdleCleanup parameter to true
in the inputs.conf file. For *nix users, this file is located under $SPLUNK_HOME/etc/apps/
splunk_httpinput/local/. For Windows users, it is under %SPLUNK_HOME%\etc\apps
\splunk_httpinput\local\.
• For a single-instance Splunk Enterprise deployment, set the ackIdleCleanup parameter to
true in the inputs.conf file. For *nix users, this file is located under $SPLUNK_HOME/etc/
apps/splunk_httpinput/local/. For Windows users, it is under %SPLUNK_HOME%\etc\apps
\splunk_httpinput\local\.
• See Troubleshoot the Splunk Add-on for Amazon Kinesis Firehose.
If you enable backup for all events or all documents, monitor two separate data-freshness metrics: one
for the main destination and one for the backup.
If the data-freshness metric isn't being emitted, this means that there is no active delivery for the
delivery stream. This happens when data delivery is completely blocked or when there's no incoming
data.
101
Amazon Kinesis Data Firehose Developer Guide
Record Format Conversion to Apache Parquet Fails
If the data-freshness metric is constantly increasing, this means that data delivery is falling behind. This
can happen for one of the following reasons.
• The destination can't handle the rate of delivery. If Kinesis Data Firehose encounters transient errors
due to high traffic, then the delivery might fall behind. This can happen for destinations other than
Amazon S3 (it can happen for Amazon Elasticsearch Service, Amazon Redshift, or Splunk). Ensure that
your destination has enough capacity to handle the incoming traffic.
• The destination is slow. Data delivery might fall behind if Kinesis Data Firehose encounters high
latency. Monitor the destination's latency metric.
• The Lambda function is slow. This might lead to a data delivery rate rate that is less than the data
ingestion rate for the delivery stream. If possible, improve the efficiency of the Lambda function.
For instance, if the function does network IO, use multiple threads or asynchronous IO to increase
parallelism. Also, consider increasing the memory size of the Lambda function so that the CPU
allocation can increase accordingly. This might lead to faster Lambda invocations. For information
about configuring Lambda functions, see Configuring AWS Lambda Functions.
• There are failures during data delivery. For information about how to monitor errors using Amazon
CloudWatch Logs, see the section called “Monitoring with CloudWatch Logs” (p. 70).
• If the data source of the delivery stream is a Kinesis data stream, throttling might be happening. Check
the ThrottledGetRecords, ThrottledGetShardIterator, and ThrottledDescribeStream
metrics. If there are multiple consumers attached to the Kinesis data stream, consider the following:
• If the ThrottledGetRecords and ThrottledGetShardIterator metrics are high, we
recommend you increase the number of shards provisioned for the data stream.
• If the ThrottledDescribeStream is high, we recommend you add the kinesis:listshards
permission to the role configured in KinesisStreamSourceConfiguration.
• Low buffering hints for the destination. This might increase the number of round trips that Kinesis
Data Firehose needs to make to the destination, which might cause delivery to fall behind. Consider
increasing the value of the buffering hints. For more information, see BufferingHints.
• A high retry duration might cause delivery to fall behind when the errors are frequent. Consider
reducing the retry duration. Also, monitor the errors and try to reduce them. For information about
how to monitor errors using Amazon CloudWatch Logs, see the section called “Monitoring with
CloudWatch Logs” (p. 70).
• If the destination is Splunk and DeliveryToSplunk.DataFreshness is high but
DeliveryToSplunk.Success looks good, the Splunk cluster might be busy. Free the Splunk cluster
if possible. Alternatively, contact AWS Support and request an increase in the number of channels that
Kinesis Data Firehose is using to communicate with the Splunk cluster.
When the AWS Glue crawler indexes the DynamoDB set data types (StringSet, NumberSet, and
BinarySet), it stores them in the data catalog as SET<STRING>, SET<BIGINT>, and SET<BINARY>,
respectively. However, for Kinesis Data Firehose to convert the data records to the Apache Parquet
format, it requires Apache Hive data types. Because the set types aren't valid Apache Hive data types,
conversion fails. To get conversion to work, update the data catalog with Apache Hive data types. You
can do that by changing set to array in the data catalog.
To change one or more data types from set to array in an AWS Glue data catalog
1. Sign in to the AWS Management Console and open the AWS Glue console at https://
console.aws.amazon.com/glue/.
2. In the left pane, under the Data catalog heading, choose Tables.
102
Amazon Kinesis Data Firehose Developer Guide
No Data at Destination Despite Good Metrics
3. In the list of tables, choose the name of the table where you need to modify one or more data types.
This takes you to the details page for the table.
4. Choose the Edit schema button in the top right corner of the details page.
5. In the Data type column choose the first set data type.
6. In the Column type drop-down list, change the type from set to array.
7. In the ArraySchema field, enter array<string>, array<int>, or array<binary>, depending on
the appropriate type of data for your scenario.
8. Choose Update.
9. Repeat the previous steps to convert other set types to array types.
10. Choose Save.
103
Amazon Kinesis Data Firehose Developer Guide
• By default, each account can have up to 50 Kinesis Data Firehose delivery streams per Region. If you
exceed this number, a call to CreateDeliveryStream results in a LimitExceededException
exception. To increase this quota, you can use Service Quotas if it's available in your Region. For
information about using Service Quotas, see Requesting a Quota Increase. If Service Quotas isn't
available in your region, you can use the Amazon Kinesis Data Firehose Limits form to request an
increase.
• When Direct PUT is configured as the data source, each Kinesis Data Firehose delivery stream provides
the following combined quota for PutRecord and PutRecordBatch requests:
• For US East (N. Virginia), US West (Oregon), and Europe (Ireland): 5,000 records/second, 2,000
requests/second, and 5 MiB/second.
• For US East (Ohio), US West (N. California), AWS GovCloud (US-East), AWS GovCloud (US-West), Asia
Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific
(Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (London), Europe (Paris),
Europe (Stockholm), Middle East (Bahrain), and South America (São Paulo): 1,000 records/second,
1,000 requests/second, and 1 MiB/second.
To request an increase in quota, use the Amazon Kinesis Data Firehose Limits form. The three quota
scale proportionally. For example, if you increase the throughput quota in US East (N. Virginia), US
West (Oregon), or Europe (Ireland) to 10 MiB/second, the other two quota increase to 4,000 requests/
second and 10,000 records/second.
Important
If the increased quota is much higher than the running traffic, it causes small delivery batches
to destinations. This is inefficient and can result in higher costs at the destination services.
Be sure to increase the quota only to match current running traffic, and increase the quota
further if traffic increases.
Note
When Kinesis Data Streams is configured as the data source, this quota doesn't apply, and
Kinesis Data Firehose scales up and down with no limit.
• Each Kinesis Data Firehose delivery stream stores data records for up to 24 hours in case the delivery
destination is unavailable.
• The maximum size of a record sent to Kinesis Data Firehose, before base64-encoding, is 1,000 KiB.
• The PutRecordBatch operation can take up to 500 records per call or 4 MiB per call, whichever is
smaller. This quota cannot be changed.
• The following operations can provide up to five invocations per second: CreateDeliveryStream,
DeleteDeliveryStream, DescribeDeliveryStream, ListDeliveryStreams,
UpdateDestination, TagDeliveryStream, UntagDeliveryStream,
ListTagsForDeliveryStream, StartDeliveryStreamEncryption,
StopDeliveryStreamEncryption.
• The buffer sizes hints range from 1 MiB to 128 MiB for Amazon S3 delivery. For Amazon Elasticsearch
Service (Amazon ES) delivery, they range from 1 MiB to 100 MiB. For AWS Lambda processing, you can
set a buffering hint between 1 MiB and 3 MiB using the BufferSizeInMBs processor parameter. The
size threshold is applied to the buffer before compression. These options are treated as hints. Kinesis
Data Firehose might choose to use different values when it is optimal.
• The buffer interval hints range from 60 seconds to 900 seconds.
• For delivery from Kinesis Data Firehose to Amazon Redshift, only publicly accessible Amazon Redshift
clusters are supported.
104
Amazon Kinesis Data Firehose Developer Guide
• The retry duration range is from 0 seconds to 7,200 seconds for Amazon Redshift and Amazon ES
delivery.
• Kinesis Data Firehose supports Elasticsearch versions 1.5, 2.3, 5.1, 5.3, 5.5, 5.6, as well as all 6.* and 7.*
versions.
• Kinesis Data Firehose doesn't support delivery to Elasticsearch domains in a virtual private cloud (VPC).
• When the destination is Amazon S3, Amazon Redshift, or Amazon ES, Kinesis Data Firehose allows
up to 5 outstanding Lambda invocations per shard. For Splunk, the quota is 10 outstanding Lambda
invocations per shard.
• You can use a CMK of type CUSTOMER_MANAGED_CMK to encrypt up to 500 delivery streams.
105
Amazon Kinesis Data Firehose Developer Guide
Document History
The following table describes the important changes to the Amazon Kinesis Data Firehose
documentation.
Added a topic on Added a topic about the expressions that you can use December 20, 2018
custom prefixes. when building a custom prefix for data that is delivered
to Amazon S3. See Custom Amazon S3 Prefixes (p. 81).
Added New Kinesis Added a tutorial that demonstrates how to send Amazon October 30, 2018
Data Firehose VPC flow logs to Splunk through Kinesis Data Firehose.
Tutorial See Tutorial: Sending VPC Flow Logs to Splunk Using
Amazon Kinesis Data Firehose (p. 90).
Added Four New Added Paris, Mumbai, Sao Paulo, and London. For June 27, 2018
Kinesis Data Firehose more information, see Amazon Kinesis Data Firehose
Regions Quota (p. 104).
Added Two New Added Seoul and Montreal. For more information, see June 13, 2018
Kinesis Data Firehose Amazon Kinesis Data Firehose Quota (p. 104).
Regions
New Kinesis Streams Added Kinesis Streams as a potential source for records August 18, 2017
as Source feature for a Firehose Delivery Stream. For more information,
see Name and source (p. 5).
Update to console The delivery stream creation wizard was updated. For July 19, 2017
documentation more information, see Creating an Amazon Kinesis Data
Firehose Delivery Stream (p. 5).
New data You can configure Kinesis Data Firehose to December 19, 2016
transformation transform your data before data delivery. For more
information, see Amazon Kinesis Data Firehose Data
Transformation (p. 45).
New Amazon You can configure Kinesis Data Firehose to retry a COPY May 18, 2016
Redshift COPY retry command to your Amazon Redshift cluster if it fails. For
more information, see Creating an Amazon Kinesis Data
Firehose Delivery Stream (p. 5), Amazon Kinesis Data
Firehose Data Delivery (p. 53), and Amazon Kinesis Data
Firehose Quota (p. 104).
New Kinesis You can create a delivery stream with Amazon April 19, 2016
Data Firehose Elasticsearch Service as the destination. For more
destination, Amazon information, see Creating an Amazon Kinesis Data
Elasticsearch Service Firehose Delivery Stream (p. 5), Amazon Kinesis Data
Firehose Data Delivery (p. 53), and Grant Kinesis Data
Firehose Access to an Amazon ES Destination (p. 34).
New enhanced Updated Monitoring Amazon Kinesis Data April 19, 2016
CloudWatch metrics Firehose (p. 57) and Troubleshooting Amazon Kinesis
and troubleshooting Data Firehose (p. 98).
features
106
Amazon Kinesis Data Firehose Developer Guide
New enhanced Updated Writing to Kinesis Data Firehose Using Kinesis April 11, 2016
Kinesis agent Agent (p. 16).
New Kinesis agents Added Writing to Kinesis Data Firehose Using Kinesis October 2, 2015
Agent (p. 16).
Initial release Initial release of the Amazon Kinesis Data Firehose October 4, 2015
Developer Guide.
107
Amazon Kinesis Data Firehose Developer Guide
AWS Glossary
For the latest AWS terminology, see the AWS Glossary in the AWS General Reference.
108