diff --git a/docs/how/kafka-config.md b/docs/how/kafka-config.md index 06c7418f167136..1bfdb5a7f7f5cf 100644 --- a/docs/how/kafka-config.md +++ b/docs/how/kafka-config.md @@ -70,6 +70,45 @@ How Metadata Events relate to these topics is discussed at more length in [Metad We've included environment variables to customize the name each of these topics, for cases where an organization has naming rules for your topics. +### Handling Large Kafka Messages + +When ingesting large metadata records, such as those from Snowflake, you may encounter `org.apache.kafka.common.errors.RecordTooLargeException`. This error occurs when the message size exceeds the configured limits in Kafka. + +#### Explanation of the Error + +The `RecordTooLargeException` indicates that the message being sent to Kafka is larger than the configured maximum message size. This is common when dealing with large metadata records. + +#### Step-by-Step Resolution Guide + +1. **Increase Kafka Configuration Limits**: + - Update the `max.request.size` for the Kafka producer by setting the `SPRING_KAFKA_PRODUCER_PROPERTIES_MAX_REQUEST_SIZE` environment variable. + - Update the `max.partition.fetch.bytes` for the Kafka consumer by setting the `SPRING_KAFKA_CONSUMER_PROPERTIES_MAX_PARTITION_FETCH_BYTES` environment variable. + +2. **Update Kafka Topic Configuration**: + - Set the `max.message.bytes` configuration for Kafka topics to allow larger messages. + +3. **Helm Chart Configuration**: + - For Helm deployments, set these configurations in the `values.yaml` file: + ```yaml + kafka: + maxMessageBytes: "10485760" # 10MB + producer: + maxRequestSize: "10485760" # 10MB + consumer: + maxPartitionFetchBytes: "10485760" # 10MB + ``` + +4. **Compression**: + - Enable compression for Kafka messages by setting the `KAFKA_PRODUCER_COMPRESSION_TYPE` to `snappy` or another supported type. + +5. **Check for Updates**: + - Ensure you are using the latest version of DataHub for potential improvements or fixes. + +#### Sources and Further Reading + +- [Slack discussion on Kafka configuration](https://datahubspace.slack.com/archives/C029A3M079U/p1701877527.416859) +- [DataHub environment variables documentation](https://github.com/datahub-project/datahub/blob/master/docs/deploy/environment-vars.md) + ### Metadata Service (datahub-gms) The following are environment variables you can use to configure topic names used in the Metadata Service container: