Navigating through the dense jungle of event-based systems can be daunting, especially when trying to decide between different message brokers for your messaging needs. With the myriad of options available, how do you determine which message broker is most suitable for your use case? In this blog post, we aim to wield the machete to clear a path through the jungle and determine which robust option is the best fit for your specific needs.
First, let’s define what event-based systems are and identify the business problems they can address. Afterward, we’ll continue to clear our path by introducing the potential message brokers that can help navigate through the complexities of event-based systems. RabbitMQ and Kafka are both powerful contenders in the world of messaging platforms, each with its unique strengths. We’ll use our compass to navigate through the key features and differences of these message brokers, cutting through the complexity to uncover their core advantages. Afterward, we’ll provide a roadmap to guide you which message broker is better suited for specific applications. Finally, we’ll summarize our findings to help you navigate this decision effectively.
Grab your backpack and get ready to enter the jungle!
As we cross the jungle, we’ll come across a river with a bridge. A little further down the river, there is a small village. Since the path from the bridge to the village is long and overgrown, the residents often use the river to warn the inhabitants about dangerous animals nearby. For this purpose, they use empty coconuts containing a note with information about which animal has been sighted. They throw these coconuts into the river. In the village, someone is always ready to collect the coconuts with the information from the river and can warn everyone in the village. That’s the basic concept of event-based systems.
Event-based systems consist of producers, a message broker and consumers. In our analogy, the producer is the person who throws the coconut into the river. They generate an event (the coconut with information about which animal is nearby). This event is then processed by the message broker (the river). In our scenario, the coconut is simply made available to the villagers after a short travel down the river. The villagers are the consumers. They consume the event and act based on the information it contains. In our case, they warn the other villagers about dangerous animals.
Producers and consumers function autonomously from one another. Our villager can toss as many coconuts into the river as desired, without worrying about how the village will subsequently handle the information from those coconuts. Villagers are free to collect the coconuts at their convenience; they simply remain by the river. This form of communication is referred to as asynchronous communication.
Apart from our small jungle example here, event-based systems find wide application across diverse domains where real-time responsiveness and scalability are paramount. These systems are extensively used in financial services, e-commerce, real-time analytics, Internet of Things (IoT) deployments, telecommunications, and beyond. They efficiently handle events such as transactions, market updates, sensor data, and network events, enabling timely processing, seamless integration, and adaptive responses to dynamic conditions. Event-driven architectures play a pivotal role in modernizing and optimizing operations by ensuring data flows smoothly across distributed systems while supporting high availability and fault tolerance.
Intrigued by the methods employed by the villagers, we continue through the jungle. Along the way, we encounter rivers and villagers who have established similar systems. Event-based systems appear to be highly favored here in the jungle. Similarly, in the real world, there are numerous slightly different technologies revolving around event-driven architectures.
Similar to rivers, message brokers vary widely in form and function. Examining open-source technologies, we find options like RabbitMQ, Apache Kafka, ActiveMQ, and Apache Storm. The major cloud providers have also developed their own systems: Amazon Kinesis, Azure Event Hubs, and Google Cloud Dataflow. While some of these technologies serve as basic message brokers, others provide comprehensive streaming platforms.
RabbitMQ and Apache Kafka are two of the most widely used technologies in the field of event-based systems. Both can accommodate a wide range of use cases and are available as open-source solutions. Moreover, they can be implemented and used independently of specific cloud providers. They boast extensive ecosystems and have been developed and used in practice for a long time. Therefore, it’s worthwhile for us to take a closer look at these two technologies for our selection.
Apache Kafka was developed by LinkedIn and released in 2010. In 2014, the core developers from LinkedIn founded Confluent, which is now the primary contributor to Kafka. Kafka can function as more than just a simple message broker; it can also be utilized as a distributed event streaming platform through its ecosystem. In our jungle metaphor, Kafka is not just a river transporting coconuts. It also provides APIs to throw coconuts into the river or retrieve them from various sources (Kafka Connect) and to transform the information within the coconuts while they flow down the river (Kafka Streams).
In our analogy, the rivers carrying the coconuts, which represent messages, are known as topics. Producers send messages to a topic, which stores them on its local disk. Consumers can subscribe to these topics and proceed with further processing of the events. Kafka operates under a “dumb broker, smart consumers” paradigm. In this setup, the river itself does very little, while the villagers are responsible for handling all the processing logic with the events. Poor villagers!
RabbitMQ was released in 2007 and is mainly developed by VMware since 2010. RabbitMQ is more referred to as a traditional message broker. In contrast to Kafka, RabbitMQ operates on the “smart broker, dumb consumer” principle. The coconuts first move to an intersection (Exchange) and are then redirected to the appropriate river (Queue). Imagine engineering the river by splitting it into multiple individual rivers. This architecture allows us to implement complex message routing. Producers send their messages to the exchange first, which then forwards the messages to the appropriate queue. Consumers can then subscribe to the queue, like Kafka, and continue processing the events.
When comparing RabbitMQ and Kafka, it’s essential to consider several key factors that influence their suitability for different use cases. These factors include throughput, latency, ecosystem, protocols, and operational experience. Throughput refers to the volume of messages each system can handle efficiently. Think about the number of coconuts the river can transport. Latency measures the delay in message processing, which can be compared to the speed of the river’s flow. The ecosystem encompasses the tools and integrations available for each platform. Protocols describe the communication methods each system uses, and operational experience covers the ease of managing and maintaining the systems in a production environment. Understanding these aspects will help in determining which messaging system best meets specific requirements.
Kafka is heavily optimized for high throughput. This is partly due to Kafka using its own binary protocol and offering the capability to batch entire sets of events, which allows for processing very large amounts of data efficiently and compactly. Think of Kafka as the widest river capable of processing the most coconuts. In contrast, RabbitMQ handles lower throughput.
While Kafka represents the widest river capable of processing the largest volumes of coconuts, RabbitMQ acts as a fast-moving river that swiftly delivers coconuts to the villagers. RabbitMQ proves advantageous in scenarios demanding low latency.
Kafka offers a broad ecosystem that spans from numerous connectors (Kafka Connect), which allow for example change data capture (CDC) to be directly produced into a topic. Additionally, Kafka Streams can be used for data transformation. Kafka Streams acts as an additional application that consumes data like a consumer and then transforms it back into a topic. This makes it straightforward to integrate data processing into the event-based system. Apache Flink integration within the Kafka ecosystem enables further data transformations. Moreover, Confluent provides ksqlDB, a SQL-like interface for interacting with the Kafka cluster.
On the other hand, RabbitMQ offers a significantly smaller ecosystem. While it can connect to databases through Debezium, it lacks the extensive array of connectors that Kafka provides. However, RabbitMQ uses standard protocols, which reduces the need for specialized knowledge compared to Kafka’s proprietary protocol. RabbitMQ also lacks an equivalent to Kafka Streams, meaning data transformations must be implemented through consumers and other technologies, often requiring more effort.
As mentioned earlier, Kafka uses its own binary protocol. Despite its many connectors, this makes it more complex to work with compared to using standard protocols. In contrast, RabbitMQ supports a range of standard protocols, such as AMQP, STOMP, and MQTT. This simplifies the integration of external systems that utilize these protocols.
The developer experience largely depends on the underlying philosophy of the brokers. With Kafka’s “dumb broker, smart consumer” paradigm, developers face more effort in implementing producers and consumers since they must handle all the logic. Conversely, this makes it easier for the platform team to configure Kafka, as the broker itself lacks complex logic.
In contrast, RabbitMQ follows a different approach. Here, it is easier for developers to create producers and consumers. However, the platform team faces more work in correctly configuring the routing rules for the queues. Overall, Kafka places much greater emphasis on its consumers and producers, resulting in more complex code for the development team.
Both Kafka and RabbitMQ offer extensive configuration options, allowing them to be optimized for a wide range of use cases. However, RabbitMQ is more forgiving of configuration errors. The RabbitMQ Management UI makes it easier to investigate these errors, whereas Kafka requires debugging primarily through CLI commands unless external monitoring tools are used.
Both technologies are platform-independent. They can be utilized as managed services from various cloud providers or self-hosted. Additionally, Kubernetes operators are available for both (Kafka, RabbitMQ), enabling quick and easy cluster deployment. Overall, RabbitMQ is quicker and easier to learn and manage compared to Kafka.
If you want a concise summary of Kafka and RabbitMQ, you can put it like this:
Based on our evaluation criteria, you can select your preferred message broker as follows:
Criterion | Favored message broker |
---|---|
Optimize for throughput | |
Optimize for latency | |
Available ecosystem | |
Available protocols | |
Overall developer experience | |
Overall operational experience |
After navigating through the jungle of event-based systems and message brokers, it can be concluded that the choice of technology always needs to be tailored to the specific use case. Certainly, there are additional factors to consider when making a technological decision here, such as whether your company already has trained developers, who can start immediately or if they need to undergo training first. I hope this blog post has provided an overview of the issues and helped guide your decision-making process.