IMPORTANT: This documentation has been discontinued. Read the updated Kafka Trigger documentation on our new documentation portal.
Kafka Trigger is responsible for the consumption of messages from a Kafka broker.
To use this trigger, it's necessary to get in touch with our Support Team to obtain the liberation.
This trigger has 2 configurable offsets commit strategies:
1. Commit with no delivery guarantee
All the messages received by the trigger are sent to the pipeline in a faster way, but with no delivery guarantee (that means, the pipeline return won't be waited for the message processing to be confirmed). With auto commit activated, we use the commit default implemented by Kafka. The message dispatch can be configured by:
All the messages received by the consumer polling will be sent together in an array. For example, if during this poll 10 messages are returned, then the trigger will send an array with these 10 messages.
One-at-one message dispatch
The dispatch to the pipeline will be made through the total array (only 1 message at a time). For example, if during this poll 10 messages are returned, then the trigger will send only 1 message at a time. In total, 10 messages dispatch will be made to the pipeline.
2. Commit with delivery guarantee
The trigger will be responsible for making the offsets commit, which will be made after the receival of a message of success from the pipeline. Only the batch dispatch of the messages is possible, through which all the messages received by the consumer polling will be sent together in an array.
Example: if during this poll 10 messages are returned, then the trigger will send an array with these 10 messages.
IMPORTANT: there may be a redistribution of consumers and/or partitions of Kafka. If this happens between the pipeline response and the return to the trigger, the offsets will receive the commit. This may result in losses or duplicate messages.
Autocommit "false" and Batch Mode "true"
In this option, the poll can bring a message array and its maximum size is defined by Max Poll Records. The messages go through commit only after the pipeline returns a successful transaction. If there's timeout during the pipeline deployment, the messages won't go through commit.
Autocommit "false" and Batch Mode "false"
In this option, the poll will send 1 message only and not a message array. That way, the messages dispatch/receival throughput decreases, but the guarantee of a successful processing is greater - which means, there's no messages loss.
IMPORTANT: if the Topic gets rebalanced in the Kafka Broker during the messages processing and the consumer have to take on other partitions, the messages will go through commit if there's an error in the end of the pipeline deployment. That way, the messages won't be processed in the following poll. To solve this issue, go for the Autocommit "false" and Batch Mode "false" configurations.
Take a look at the configuration parameters of the trigger:
Account: name of the account to be used.
Truststore: if it’s necessary to inform a truststore to make the SSL Handshake using private certificates, a CERTIFICATE-CHAIN account type must be created and the concatenated certificates must be informed. It’s optional to inform, in the “password” field, the password to be registered in the truststore creation.
Keystore: if it’s necessary to inform a keystore to make the mutual SSL authentication, a CERTIFICATE-CHAIN account type must be created, the complete chain with the concatenated certificates and the private key to be used for the SSL mutual authentication must be informed. If there’s a private key, it’s necessary to inform it in the “password” field.
Brokers: brokers of the server (HOST: PORT) used to send registers. To inform multiple HOSTS, you can separate them by comma. Example: HOST1:PORT1,HOST2:PORT2,...,HOSTn:PORTn
Topic: name of the topic that recovers the registers.
Protocol: protocol used to communicate with the brokers.
Consumer Group Name: a single string that identifies the consumer group which this consumer belongs to.
Auto Commit: if “true”, the message will pass automatically for commit as soon as it's received by the trigger; otherwise, the trigger will make the commit manually after the pipeline processing confirmation.
Send Batch: it can only be used with autoCommit - if "true", a poll of more than 1 message will be sent as array; otherwise, only 1 message at a time will be sent.
Key As Avro: if "true", the keys of the received records will be interpreted in Avro format, otherwise they will be interpreted as String.
Payload As Avro: if "true", the payloads (values) of the received records will be interpreted in Avro format, otherwise they will be interpreted as String.
Schema Registry URL: if at least one of the Key As Avro and Payload As Avro options is activated, the field will be displayed to inform the Schema Registry URL.
Schema Registry Account: account for authentication with Schema Registry (Basic account or Oauth-Bearer).
Schema Registry Truststore: if it is necessary to inform a truststore to perform the SSL Handshake using private certificates, you must create a CERTIFICATE-CHAIN type account and inform the concatenated certificates. In the “password” field, it is optional to insert the password to be registered in the creation of the truststore.
Schema Registry Keystore: if it is necessary to inform a keystore to perform mutual SSL authentication, you must create a CERTIFICATE-CHAIN type account, inform the complete chain with the concatenated certificates and the private key to be used for mutual SSL authentication. If there is a private key, it is necessary to inform it in the “password” field.
Max Poll Records: maximum number of registers recovered by a long poll.
Include Headers: if the option is enabled, the message headers will be included in the pipeline input payload.
Binary Headers: if the option is enabled, the input header values will be considered as binary and presented as a base64 representation. This option will be displayed only when Include Headers is enabled as well.
Headers Charset: name of the characters code to codify the header values (standard UTF-8). This option will be displayed only when Include Headers is enabled as well.
Maximum Timeout: how long a pipeline can be executed (in milliseconds).
Kerberos Service Name: value defined in the sasl.kerberos.service.name property configured in the Kafka broker server side.
Partition Numbers: specifies the number of the partitions in which Kafka Trigger will consume the messages. More than one partition can be configured and, if this property isn’t configured, Kafka Trigger will consume from all the topic partitions.
IMPORTANT: we accept no more than 5MB of message dispatch per poll. It's not part of the standard to use Kafka to transmit big messages; We recommend you to use the (message.max.bytes) property in the broker for 1MB maximum. Avro format data traffic capacity is also included in this size limitation. The Kafka Trigger key and payload settings must match the settings of the topics to be consumed by the Trigger. If Key As Avro is enabled, all keys of the records to be consumed must be in Avro format. If Payload As Avro is enabled, all payloads (values) of records to be consumed must be in Avro format.
Avro format support is currently in Beta phase.
The consumers' configuration has direct impact on the messages input and output throughput when Kafka Trigger is activated. The ideal use scenario is to have the same configured consumer and partition quantity in a given topic.
If there are more consumers than partitions, the exceeding consumers will be idle until there's a partition increase. And, if this increase occurs, Kafka will start the consumer balancing process.
It's the consumer group to which your pipeline will make the subscription in Kafka's topic. A topic can have "n" consumers groups and each of them will have "n" consumer that consume the topic's registers.
Let's say there's a topic named kafka-topic, a pipeline that uses a trigger configured by the consumer group (Consumer Group Name) named digibee and a second pipeline that uses a trigger configured with the same topic, but with a consumer group named digibee-2. In this case, both pipelines will receive the same messages.
Let's say there's a topic named kafka-topic, a pipeline that uses a trigger configured by the consumer group (Consumer Group Name) named digibee and a second pipeline that uses a trigger configured with the same topic and consumer group (digibee). Both pipelines will receive the messages given by this topic. However, Kafka is in charge of balancing the partitions between the consumers registered in the two triggers. In this case, both pipelines will receive messages in an intercalated way, according to the partitions distribution.
Authentication using Kerberos
To use the authentication via Kerberos in Kafka Trigger is necessary to have registered the configuration file “krb5.conf” in the Realm parameter. If you haven't done it yet, get in touch with us by the chat service. After finishing this step, all you have to do is to correctly set a Kerberos-type account and use it in the component.
Message format in the pipeline input
Pipelines associated with Kafka trigger receive the following message as input:
"data": <STRING message content>,
"topic": <STRING The topic from which the record is received>,
"offset": <LONG The position of the record in the corresponding Kafka partition>,
"partition": <INT The partition from which the record is received>,
"success": <BOOLEAN Indicates whether the individual message was successfully consumed or not>,
"header1": "value1", … (when included)
"success": <BOOLEAN Indicates whether all the messages were successfully consumed or not>