Why is Kafka not Ideal for Event Sourcing?

Kafka is a powerful streaming and messaging platform that is often used to build large-scale systems. There are many use cases for which Kafka is very well suited, but from time to time Kafka gets confused with concepts that it was originally not designed for. One of these concepts is event sourcing.

The Apache Software Foundation describes Kafka on its website as follows:

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
(Source: https://kafka.apache.org/)

Personally, I like vendor descriptions like this because they aim to emphasize exactly what kind of problems a specific tool solves. However, in this case, there is not a single word about event sourcing. When having a closer look at the documentation, there is indeed a small paragraph about event sourcing, but it lacks details.

It happens frequently that I stumble across posts, videos, and sometimes even books that mention Kafka can be used for event sourcing. While that is not completely wrong, it still raises whether a streaming platform like Kafka is actually able to perform the tasks of an event store. This is exactly what I would like to discuss in this post.

What is an Event Store?

An event store is a type of database whose primary focus is on storing and receiving events. In contrast to other database types such as relational databases, event stores “natively” support the append-only semantics and immutability principles of events.

Remember, events cannot be modified once there are created because they represent facts that have already happened within a domain. This is the reason why events are immutable.

The Relation between Event Sourcing and Event Stores

Event Sourcing, in its most basic form, is nothing more than a persistence mechanism. Instead of writing the current state of an object to, let’s say, a relational database, we store a series of events associated with that object in a specific database: the Event Store.

In summary, event sourcing is the persistence mechanism, while the event store is the actual database where the events for each object are persisted.

Note: While there are databases built from the ground up for event sourcing, I also saw event stores that are built on top of established databases, such as PostgreSQL.

If you want to learn more about event sourcing, feel free to check out my other blog posts or refer to blog articles from EventStoreDB.

Kafka + Event Sourcing?

Kafka is a great and powerful tool when it comes to streaming systems, but is it also ideal for event sourcing? The next subsections aim to give an idea of whether Kafka can actually fulfill the typical requirements of an event store.

Optimistic Concurrency Control

A common requirement of a database made for event sourcing is to detect concurrent writes. This is typically achieved by using some kind of optimistic concurrency control. Optimistic concurrency control ensures that decisions are made based on the current state of an object, which is a guarantee that many databases provide. One way to implement this is by using a version number that the event store can use to detect whether an event stream was modified between the read and write action. This approach is successfully used in EventStoreDB, for example.

Kafka does not support optimistic concurrency control. However, another blog article mentioned that it is possible to lock a whole partition, which would at least allow for pessimistic concurrency. Either way, optimistic locking is a much more scalable and performant solution compared to pessimistic locking. Many other viable solutions, such as a “database frontend”, can mitigate this issue but often result in additional complexity.

Efficient Loading of Single Entities

A very common use case of an event store is to load an entity by its identifier. An event store persists an entity’s events in an event stream. When we want to load and reconstruct an entity’s state in memory, we have to load the events contained in the corresponding event stream. Event stores such as EventStoreDB are optimized for this use case.

In contrast, Kafka deals with topics. A topic typically represents a whole type of entity, for example, customers, orders, or sessions. Unfortunately, there is no practical way to load a specific entity by its identifier within a topic. We can assume to deal with linear complexity here.

What feels more natural to you: Loading a single event stream by its ID, or (potentially) loading a whole topic and iterating over that topic to filter out events for a given entity?

Consistent Writes

If a series of events is appended to an event stream, we want to make sure that all our events are persisted in one atomic transaction. Partially written or duplicated events would lead to consistency issues. This kind of transaction support is a matter of course for event sourcing solutions like EventStoreDB. In the case of EventStoreDB, this concept is better known as atomic writes.

Actually, Kafka comes with support for transactions for a few years now. Even though Kafka supports transactions, more work is required to accomplish the same effect compared to using an event store solution that provides this guarantee out of the box.

Storing and Publishing the Same Events?

While using events as a persistence mechanism, publishing events to notify other components about a change are two different scenarios. The first scenario typically deals with domain events, while the latter is subject to integration events. Still, there are a lot of cases where the same events are used for persistence and integration.

Events that are stored as part of the persistence mechanism (event sourcing) have nothing to do with the events that are used to notify other components about a state change in the system. Note that these potential components could be other service boundaries.

Event sourcing is just an implementation detail. The question to ask is whether we really want to expose those implementation details outside our service boundary. It feels a bit like pushing a service’s data store into other services’ faces and saying: “Here! Take all my data and do whatever you want with it.”

From my experience, we don’t want other components or services to read data directly from our database. Instead, we want to control which data to expose and which to keep inside our own boundary through a well-defined contract, e.g., an API.

Use the Right Tool for the Job

If you decide to use event sourcing for parts of your system, you probably have a good reason to do so. In such cases, the use of proper tooling is recommended to avoid additional complexity. The event sourcing approach is just different and sometimes even confusing if not understood completely compared to more traditional approaches. Using tools that were just not made for event sourcing won’t make our lives easier.

Again, Kafka is a great streaming and messaging platform, and it is designed to solve a specific set of problems. This set of problems does not necessarily include how event-sourced entities should be stored. On the other hand, an event store is designed for a different set of problems, and certainly cannot (or should not) be used for use cases for which Kafka was designed.

In my opinion, using the right tool for the job should be the preferred choice, given that all organizational as well as technical constraints allow the use of these tools.

Conclusion

In conclusion, can Kafka be used for event sourcing? Yes.
Is Kafka an ideal solution for event sourcing? No.

Kafka is a powerful tool but not the most ideal solution for event sourcing because event stores come with requirements that are just more complex and challenging to meet for a streaming platform like Kafka.

In the end, one has to make the tradeoff analysis and ultimate decision on whether to use an established event sourcing solution or Kafka. What remains is the following: A great messaging platform cannot also be the best event store and vice versa.