Empowering Your Digital Transformation
Apache Kafka implementation & optimization
The performance of Apache Kafka is closely linked to how well the system is optimized and utilized. In this guide, we’ve outlined best practices for handling high-volume, clustered environments with general use cases.
It’s important to note that these recommendations are broad in nature. To achieve the best results for your specific scenario, it’s essential to continuously monitor Kafka’s performance and tailor the implementation to suit your unique requirements.


Java Settings
- Java Version: Use Java 11 for improved performance and security. Always opt for the latest patch to address known vulnerabilities.
JVM Configuration:
-Xmx6g -Xms6g
-XX:MetaspaceSize=96m
-XX:+UseG1GC
-XX:MaxGCPauseMillis=20
-XX:InitiatingHeapOccupancyPercent=35
-XX:G1HeapRegionSize=16M
-XX:MinMetaspaceFreeRatio=50
-XX:MaxMetaspaceFreeRatio=80
-XX:+ExplicitGCInvokesConcurrent
These settings are optimized for better memory management, garbage collection performance, and application responsiveness. Adjust the heap size (-Xmx
and -Xms
) based on your specific workload.
OS Settings
After configuring the JVM memory settings, allocate the remaining RAM to the operating system for page caching. Kafka relies on the OS’s page cache for efficient reads and writes.
-
Kafka supports a variety of Unix-based systems, and it is most commonly tested on Linux (we use CentOS for reasons beyond Kafka itself).
-
File Descriptor Limits: Kafka requires file descriptors for managing log segments and open connections. We recommend setting the file descriptor limit to at least 100,000 for the broker processes.
-
Max Socket Buffer Size: Kafka can dynamically increase the socket buffer size to optimize data transfer, especially in high-performance or cross-data-center scenarios.
Disk and File System Configuration
Disk and file system setup is one of the areas where performance can be easily impacted.
-
Dedicated Drives per Partition: Use a single drive or RAID array for each partition. Avoid sharing drives between partitions or other applications, as it can disrupt Kafka’s ability to perform sequential reads and writes efficiently.
-
Multiple Disks: Kafka supports configuring multiple drives via the
log.dirs
setting inserver.properties
. Kafka will round-robin assign partitions to the available directories for load distribution. -
Disk Usage Monitoring: Set up alerts to monitor disk usage for each drive dedicated to Kafka. If a disk becomes full, it can severely impact performance.
-
RAID Considerations: RAID is typically used for its redundancy features, but keep in mind that when a RAID array is rebuilding, disk performance can degrade, causing Kafka nodes to act as though they are down.
-
Log Flush Management: Stick with the default flush settings to disable the application-level
fsync
completely.
File System Choice
We recommend using XFS for Kafka’s file system. It is well-suited for Kafka’s performance needs, as it has built-in auto-tuning capabilities that enhance performance out of the box.
Zookeeper
-
Avoid Co-locating Kafka and Zookeeper: Do not run Zookeeper on the same machines as Kafka brokers. It’s better to separate the services for stability and performance.
-
JVM Allocation for Zookeeper: Monitor your Zookeeper instance’s JVM usage. While 3GB is a good starting point, always base the allocation on actual usage.
-
Zookeeper Version: Always use the version of Zookeeper that ships with the Kafka release you’re using, rather than relying on an OS package. This ensures compatibility and stability.
-
Zookeeper Monitoring: Use JMX metrics to keep an eye on Zookeeper’s health and performance.
-
Simplify the Zookeeper Cluster: Keep your Zookeeper cluster small and simple. Use it exclusively for Kafka-related tasks to minimize complexity and maximize reliability.
Topic and Partition Strategy
The number of partitions plays a crucial role in the parallelism and scalability of Kafka consumers. More partitions allow more consumers to process data in parallel, improving throughput. However, excessive partitions can lead to increased latency, as consumers need to keep track of more partitions.
-
Number of Partitions: If a consumer can handle 1,000 events per second (EPS) and you need to consume 20,000 EPS, create 20 partitions.
-
End-to-End Latency: While more partitions can boost throughput, they can also add to the end-to-end latency, which is the time taken from a producer publishing a message to a consumer reading it.
For more detailed advice on how to create topics and determine the ideal number of partitions, refer to our posts on Creating a Kafka Topic and How Many Partitions Are Needed?.
Kafka Broker Configuration
-
Heap Size for Brokers: Set the Kafka broker’s heap size by exporting
KAFKA_HEAP_OPTS
in your environment. -
Log Retention Settings: Use the
log.retention.hours
parameter to control when Kafka deletes old messages. We typically set this to 72 hours (3 days), but this can be adjusted based on your use case and disk space considerations. -
Maximum Message Size: The
message.max.bytes
setting controls the maximum size of a message that Kafka will accept. Ensure thatreplica.fetch.max.bytes
is configured to be equal to or greater than this value. -
Topic Deletion: The
delete.topic.enable
setting allows users to delete Kafka topics. By default, this is set to false. Note that topic deletion was introduced in Kafka 0.9. -
Leader Election: The
unclean.leader.election
setting is enabled by default (set to true). This prioritizes availability over durability. In cases where durability is more important, you can disable this option to avoid a leader election in scenarios where a broker is not in a fully synced state.
Critical Kafka Configurations
batch.size
: This configuration determines the number of events Kafka will batch together before sending them to the broker. Tuning this value can help balance throughput and latency, with larger batches improving throughput at the expense of higher latency.linger.ms
: This setting controls the amount of time the producer will wait before sending a batch to Kafka. If no batch is ready by the specified time, it will send the batch anyway. Setting a higherlinger.ms
can improve batching efficiency but may increase latency.compression.type
: Kafka supports multiple compression types, with Snappy being the highest-performing option among the default choices. It provides a good balance between compression ratio and speed.max.in.flight.requests.per.connection
: This setting controls the number of requests that can be sent on a connection before waiting for a response. Adjust this value if the order of message delivery is not critical, as increasing it can improve throughput by allowing multiple requests to be pipelined.acks
: The acknowledgment configuration determines how many broker acknowledgments are required before a producer considers a message successfully written. For maximum reliability, we recommend setting this toacks=all
(oracks=-1
), ensuring that all in-sync replicas acknowledge the message before it’s considered committed.
Performance Considerations
- Single Partition Producers: Sending data to a single partition is typically faster than sending to multiple partitions. If the consumer’s parallelism allows, targeting a single partition can optimize throughput and reduce the overhead of managing multiple partitions.
- Batching Data: Batching messages together before sending them improves producer performance. Although the ideal batch size may vary based on your workload, a minimum of 1 KB per batch is recommended for good performance. Conduct performance tests to find the batch size that works best for your use case.
- Producer Throughput: If your producer is maxing out its throughput and there is available CPU and network capacity, consider increasing the number of producer processes to distribute the load more effectively.
- Avoid Triggering
linger.ms
for Small Batches: Iflinger.ms
is set too low (e.g., less than 20ms) or if batches are smaller than 1 KB, the performance can degrade significantly. Aim forlinger.ms
values like 20 ms or higher, depending on the latency you can tolerate. - Max in-Flight Requests: Setting
max.in.flight.requests.per.connection
to a value greater than 1 enables pipelining and can significantly improve throughput. However, be aware that this can lead to message ordering issues if a retry is needed, as Kafka might resend messages in a different order. If you set this value too high, it can also hurt throughput due to increased overhead.
Performance Considerations for Kafka Consumers
- Optimize Consumer Libraries and Code: On the consumer side, the most significant performance improvements typically come from using more efficient libraries or optimizing the consumer code itself. Make sure you’re using well-optimized Kafka client libraries and focus on reducing unnecessary processing within your consumer logic.
- Align Consumers with Partitions: For optimal performance, ensure that the number of consumer threads or processes does not exceed the number of partitions. While a single consumer can read from multiple partitions, each partition can only be consumed by one consumer at a time. Therefore, keeping the number of consumers equal to or fewer than the number of partitions will help ensure that all partitions are being processed efficiently without overloading any single consumer.
By following these guidelines, you can help maximize throughput and minimize consumer-side bottlenecks in your Kafka deployment.