Try our new research platform with insights from 80,000+ expert users
PeerSpot user
Hadoop Technical Lead (Assistant Consultant) at a tech services company with 10,001+ employees
Real User
This is the base streaming component of our IoT platform. It needs a separate cluster and a separate administrator.

What is most valuable?

  • Distributed
  • Persistence
  • Offset management by consumer

How has it helped my organization?

This is the base streaming component of our IoT platform.

In case of disaster recovery, we mirror the data in the cluster by maintaining the offsets and store the data within Hadoop 2.8 HDFS.

What needs improvement?

  • It needs a separate cluster and a separate administrator to manage the Kafka cluster, adding an extra cost.
  • It is challenging when data is moved to a mirror cluster, in the case of disaster recovery. It doesn't keep the offset.

For how long have I used the solution?

I have used this solution for one year.

Buyer's Guide
Apache Kafka
June 2025
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.
857,028 professionals have used our research since 2012.

How are customer service and support?

The open source community is very strong. Also, distributors like Cloudera and Hortonworks provide paid support.

Which solution did I use previously and why did I switch?

For big data, we did not have a previous solution. I have used Microsoft MQ for building traditional systems.

How was the initial setup?

The setup was straightforward.

What's my experience with pricing, setup cost, and licensing?

This is open source with the cost of a cluster administrator.

Which other solutions did I evaluate?

We did not look at anything else. At that time, this was already accepted by the industry for streaming data processing.

What other advice do I have?

If the Hadoop distribution is MapR, then consider MapR Streaming. MapR Streaming has overcome these fundamental issues. It stores data within the MapR-FS itself. So there is extra overhead, but with a licensing cost.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
FounderC32bc - PeerSpot reviewer
Founder, CEO at a tech vendor with 1-10 employees
Real User
The ability to partition data is valuable. There are far superior and cheaper alternatives in cloud-based solutions
Pros and Cons
  • "The ability to partition data on Kafka is valuable."
  • "The product is good, but it needs implementation and on-going support. The whole cloud engagement model has made the adoption of Kafka better due to PaaS (Amazon Kinesis, a fully managed service by AWS)."

How has it helped my organization?

We have used Kafka for streaming customer web clicks from live sessions to understand customer behavioral patterns.

What is most valuable?

The ability to partition data on Kafka is valuable. But Kafka needs support and management. It is better to have it fully managed on the cloud.

The only reason I give Kafka as product a low rating is because there are far superior and cheaper alternatives in cloud-based solutions, where we save money on manpower, electricity, servers, datacenters, networking, etc.

In fact, this is the view I have for pretty much all open source software compared to cloud based services. They just make things cheaper, faster, scalable and manageable. Kafka is good, but Kafka as a cloud service is awesome!!

This is a relative rating (compared to cloud services), not that something is wrong with Kafka. I hope that is clear.

What needs improvement?

The product is good, but it needs implementation and on-going support. The whole cloud engagement model has made the adoption of Kafka better due to PaaS (Amazon Kinesis, a fully managed service by AWS).

What do I think about the stability of the solution?

No issues here with stability.

What do I think about the scalability of the solution?

Ah, scalability!!! We need to set up multiple servers again for handling the load, which makes Kafka not scalable, unless you subscribe to cloud services.

How are customer service and technical support?

It’s an Apache-community based support, so it is not really prioritized if you have a business issue. This is why most enterprise customers pay for cloud services.

Which solution did I use previously and why did I switch?

We didn’t have a previous solution. We started with Kafka and then switched to Amazon Kinesis (PaaS for Kafka). I think Microsoft Azure also released a competing service.

How was the initial setup?

The setup was straightforward.

What's my experience with pricing, setup cost, and licensing?

Licensing issues are not applicable. Apache licensing makes it simple with almost zero cost for the software itself.

Which other solutions did I evaluate?

We unsuccessfully, and kind of foolishly, tried Apache Camel. They were not similar in services, so we moved to Kafka rightfully, and then to AWS cloud ultimately.

What other advice do I have?

If you have a dedicated Kafka resource to implement and manage the services, then go for Apache Kafka. Otherwise, do consider cloud-based services from AWS or Azure.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Apache Kafka
June 2025
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.
857,028 professionals have used our research since 2012.
it_user660627 - PeerSpot reviewer
Senior Software Engineering Consultant at a tech services company with 51-200 employees
Consultant
It offers throughput with built-in fault-tolerance and replication.
Pros and Cons
  • "Kafka, as compared with other messaging system options, is great for large scale message processing applications. It offers high throughput with built-in fault-tolerance and replication."
  • "Kafka requires non-trivial expertise with DevOps to deploy in production at scale. The organization needs to understand ZooKeeper and Kafka and should consider using additional tools, such as MirrorMaker, so that the organization can survive an availability zone or a region going down."

How has it helped my organization?

I used Kafka with a client to decouple applications with different availability profiles. Before using a messaging-based architecture with Kafka as the messaging system, the client used a coordinator application to fire off various posts to as many as eight other applications. With an application that's impacting at least a customer a second in airports, where the customers demand that the system always works, there were issues with ensuring high availability.

A typical way to calculate system availability is: Availability = Uptime/(Uptime + Downtime). Hence, where there are two applications involved with a 99% availability, the total system availability degrades quickly: 99% * 99% = 98.01%.

With eight applications, total availability caused issues. However, only two systems needed to provide real-time responses, while other systems were for payment processing, CRM, promotions, etc. It was OK if those systems were not up to date in real time.

Kafka allowed the client to have temporal decoupling for writes, i.e., the flaky third-party CRM system did not need to be available at the moment for us to respond to a user with a successful response. The availability concerns shifted to Kafka, which is a better trade off because it's built for this.

Another benefit, though not required, was the addition of logical decoupling between applications. Additional consumers could be built to overlay concerns of analytics, but the systems responsible for creating the entities on a given topic did not need to be aware of the analytics applications. This simplifies the interaction between applications and concerns of an organization.

Another benefit of this architecture is that testing is simplified. A given application needs to be tested to obey a contract of reading a message and producing another message. A Kafka topic acts as the boundary for an integration test.

What is most valuable?

Kafka, as compared with other messaging system options, is great for large scale message processing applications. It offers high throughput with built-in fault-tolerance and replication.

Messaging systems in general allow for logical and temporal decoupling between applications. Given Kafka's high availability, it's a great option to use if applications require availability, but not real-time processing.

If a downstream system is offline, messages can queue up and process when possible, but the user may not necessarily need to be aware of any issues.

A messaging-based architecture becomes important as a set of micro-services need to scale with high availability. Kafka is a great choice for messaging with such architecture.

What needs improvement?

Kafka requires non-trivial expertise with DevOps to deploy in production at scale. The organization needs to understand ZooKeeper and Kafka and should consider using additional tools, such as MirrorMaker, so that the organization can survive an availability zone or a region going down.

Shifting availability concerns to Kafka means that it cannot go down. It's important to understand the partitioning model and replication needs before relying on it for critical business functions. I'd suggest using it with a feature toggle for a non-critical path in production and learning from failure before relying on it.

While Kafka is built to scale, that does not mean that applications can start as many consumers or producers without consideration for how Kafka brokers will perform. Considerations about scaling out brokers need to occur before publishing millions of messages.

What do I think about the stability of the solution?

Generally, there were no stability issues. However, there was one scare in production when a consumer rebalance took 30 minutes and messages were not being processed during that time.

What do I think about the scalability of the solution?

We have not yet had scalability issues!

How are customer service and technical support?

There are specialized consulting companies in this space and there are online resources to read. That may help companies get past hurdles.

Which solution did I use previously and why did I switch?

No, we did you use a previous messaging system.

How was the initial setup?

The setup was complex. One must consider setting up ZooKeeper, Kafka, multi-zone/region availability, as well as typical associated functions for running it all in production. This includes monitoring, message schema changes (consider Avro), encrypting messages if it's a concern, potentially authorization for different topics depending up on the sensitivity of data.

If an organization uses Kafka as the first messaging system, then the approach for application design must also shift significantly.

What's my experience with pricing, setup cost, and licensing?

It is open source software.

Which other solutions did I evaluate?

The client evaluated alternatives before I arrived, but I was not there during the evaluation so I cannot comment.

What other advice do I have?

Consider using a managed Kafka service, such as from Heroku.

If messaging is not a central component of the business and vendor lock-in is less of a concern, consider using something like Amazon's Kinesis. This can more rapidly provide the benefits of a messaging service without the pain of understanding it deeply, setting it up, and managing it.

It's important to use a lean approach to understand how it will break in production.

Implement a non-critical transaction with it.

Perhaps use a feature toggle within a facade and implement the behavior with the old approach and with Kafka to reduce risk.

Add it to one or two applications and monitor how it goes.

Figure out security, monitoring, scaling, schema migration, etc., before using it as a critical component in an application.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
PeerSpot user
Principal Software Architect at a tech services company with 11-50 employees
Consultant
Does real-time streaming and persistence into distributed nodes. It provides a mechanism to create, publish, and subscribe.

What is most valuable?

Real-time streaming and persistence into distributed nodes. It provides a simple mechanism to create, publish, and subscribe.

How has it helped my organization?

We are using Kafka as part of our product. It is one of the messaging layers used to interact between various layers of software modules. This provides a clear separation of modules and leverages it for development and testing of different modules.

What needs improvement?

The management tools are getting mature. When we have thousands of topics, it is hard to visualize.

For how long have I used the solution?

I’ve been using Kafka for two years.

What do I think about the stability of the solution?

We have not encountered any stability issues.

What do I think about the scalability of the solution?

We have to balance the nodes when topics partition across cluster nodes. As it assumes they are of equal sizes, sometimes some nodes may not be allocated similar resources. Reassignment moves all the partitions of specified topics which may be an issue when not planned for.

How are customer service and technical support?

We have the source code to make changes if necessary.

Which solution did I use previously and why did I switch?

Kafka rendered itself suitable for our product offering. It supports all the necessary requirements for a real-time pipeline.

How was the initial setup?

Setting up was easy with ZooKeeper.

What's my experience with pricing, setup cost, and licensing?

With paid support from Confluent, you get the additional benefit of Kafka Connect.

Which other solutions did I evaluate?

We used Akka Streams for faster communication, but it would require additional configuration and setup for persistence. Kafka provides those by default.

What other advice do I have?

Kafka provides distributed persistence and streaming layers. The user has flexibility in managing as a consumer on how to consume messages if they have to handle resilience in their code. It requires ZooKeeper.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
it_user653562 - PeerSpot reviewer
Solutions Architect at a consultancy with 1,001-5,000 employees
Consultant
Has the ability to write data at one velocity and have subscribing consumers read at different velocities.
Pros and Cons
  • "Apache Kafka is actually a distributed commit log. That is different than most messaging and queuing systems before it."
  • "The GUI tools for monitoring and support are still very basic and not very rich. There is no help in determining a shard key for performance."

How has it helped my organization?

Kafka has a guaranteed delivery mechanism that is very easy to set up. When starting out with minimal hardware, it can handle very large data volumes. When prototyping and creating a proof of concept, Kafka has helped to speed up the timeline from the prototype all the way to production volumes.

What is most valuable?

Apache Kafka is actually a distributed commit log. That is different than most messaging and queuing systems before it. I find the ability to write data at one velocity and have subscribing consumers read at different velocities to be the best feature.

What needs improvement?

The GUI tools for monitoring and support are still very basic and not very rich. There is no help in determining a shard key for performance.

What do I think about the stability of the solution?

We did not have any issues with stability.

What do I think about the scalability of the solution?

We did not have any issues with scalability.

How are customer service and technical support?

  • Kafka is open source from LinkedIn and support comes from the community of users.
  • You can go with Confluent, the company that was founded by the original engineers from LinkedIn.
  • You can go with a cloud hosting service, like AWS EMR or Azure HDInsight.


    Which solution did I use previously and why did I switch?

    We used traditional message queues and file semaphores. There was a lot of overhead with asynchronous messages being put into an order and making sure nothing got dropped. It required a lot of code and maintenance.

    How was the initial setup?

    Since it is open source, you are on your own for setup. However, the tutorials from the Apache foundation and online sources have been an immense help.

    Getting started is very easy. The complexity of very large volumes of data and appropriate sharding, however, is difficult. There are fewer resources for tuning and best practices.

    What's my experience with pricing, setup cost, and licensing?

    When starting to look at a distributed message system, look for a cloud solution first. It is an easier entry point than an on-premises hardware solution. A lot of the complexity has already been taken care of. Both AWS and Azure have supported Kafka clusters that can be provisioned very easily.

    Which other solutions did I evaluate?

    We looked at RabbitMQ and Spark Streaming.

    What other advice do I have?

    Be sure to define the use cases as best as possible at first.

    Kafka is very good, but it is complex to support. It can handle any message size, whereas native cloud options have size limitations.

    Be sure to understand what messages will be sent and how many discrete topics will be needed.

    Be aware that you must code both producers and consumers.

    The bulk of the work is with the consumer.

    The Apache stack for Kafka is very open source. There are essentially no tools other than command line options to monitor brokers and topic health. So there are 3rd party tools that will help with that, some free, some paid – but it requires that you install agents on the servers hosting Kafka and open up ports for netbeans on the scripts that start up the Kafka services. Additionally, you also have to monitor zookeeper – which is very memory intensive. Cloud offerings that provide the whole modern data architecture stack – like AWS EMR and Azure HDInsight as well as Hortonworks and Cloudera provide a console GUI as part of each of their offerings. Also Confluent, a company founded by the Linked-In engineers that designed Kafka, also have a paid enterprise offering that has much better tools for maintain the kafka cluster. But apache Kafka with the community – you are on your own.

    Disclosure: My company does not have a business relationship with this vendor other than being a customer.
    PeerSpot user
    it_user660630 - PeerSpot reviewer
    SDET II at a tech services company with 5,001-10,000 employees
    Consultant
    Replication and partitioning are valuable features.

    What is most valuable?

    • Replication, partitioning, and reliability are the most valuable features.
    • Even if one of my clusters fails, the replication factor of a topic makes sure that I have the data available for processing, so I won't lose any of it.
    • Partitioning enables me to process the parallel requests. It helps in reaching the throughput.

    What needs improvement?

    One improvement is in regards to the OS memory management. In case there are too many partitions, it runs into memory issues. Although this is a very rare scenario, it can happen.

    For how long have I used the solution?

    I have been using this product for a year now.

    What do I think about the stability of the solution?

    There were no stability issues.

    What do I think about the scalability of the solution?

    Kafka is a highly scalable product. We have not faced any scalability issues so far.

    How is customer service and technical support?

    Since it's an open source product, no technical support is available. However, the open source community is very active.

    How was the initial setup?

    The initial setup was straightforward. Just go through the Kafka documentation and it will be up and running in no time.

    What's my experience with pricing, setup cost, and licensing?

    Since it's an open source product, there is no pricing for it.

    Disclosure: My company does not have a business relationship with this vendor other than being a customer.
    PeerSpot user
    it_user647457 - PeerSpot reviewer
    Head of Engineering
    Vendor
    Interactions among micro-services are used as input to our analytics infrastructure.
    Pros and Cons
    • "Ease of use."
    • "Stability of the API and the technical support could be improved."

    How has it helped my organization?

    Kafka was at the base of our system architecture. The system was designed as an event based architecture. Almost all the interactions among micro-services and the same data are used as input to our analytics infrastructure.

    What is most valuable?

    • Scalability
    • Reliability
    • Ease of use

    What needs improvement?

    Stability of the API and the technical support could be improved.

    The Kafka API is changing quite radically with the different releases. There are many new improvements and that's good. But the inherent cost of adapting to a new version of the platform was worrying me at the time.

    The documentation was sometimes misleading, since it was describing some feature in the new version of the API rather than the one we were using.

    What do I think about the stability of the solution?

    We did not encounter any issues with stability.

    What do I think about the scalability of the solution?

    We did not encounter any issues with scalability.

    How are customer service and technical support?

    We were not completely satisfied with the technical support. We subscribed to the Confluent professional platform to receive guidance and support on development and deployment. Whilst the development side is quite well covered by their consultants, the deployment and administration is not at the same level.

    Which solution did I use previously and why did I switch?

    The previous solution was not really an equivalent one. I have been using several messaging systems, but Kafka fits us better for a more scalable system.

    How was the initial setup?

    The initial setup was straightforward.

    What's my experience with pricing, setup cost, and licensing?

    I would not subscribe to the Confluent platform, but rather stay on the free open source version. The extra cost wasn't justified.

    Which other solutions did I evaluate?

    We didn't evaluate other options, as we already had a positive experience across the team with Kafka. Everybody agreed to work with it.

    We were considering Kinesis too, since we were running on AWS. We preferred to opt for a tool with which people were more familiar.

    What other advice do I have?

    The product is easy to use. However, to leverage its power, there is a need for good knowledge of event based processing. I suggest using the massive amount of material shared by the Confluent team, or what is available online.

    Disclosure: My company does not have a business relationship with this vendor other than being a customer.
    PeerSpot user
    PeerSpot user
    Deputy General Manager, DevOps Manager at a comms service provider with 10,001+ employees
    Real User
    One of the best features which I have worked with is replay.

    What is most valuable?

    One of the best features which I have worked with is replay.

    How has it helped my organization?

    Real-time log aggregation which was earlier done with rsync has been moved to Kafka infrastructure along with other real-time streams.

    What needs improvement?

    • GUI for Kafka infrastructure monitoring and deployment

    For how long have I used the solution?

    I have used it for two years.

    What was my experience with deployment of the solution?

    Documentation is quite comprehensive.

    What do I think about the stability of the solution?

    I found it very stable.

    What do I think about the scalability of the solution?

    No issues with scalability.

    How are customer service and technical support?

    Customer Service:

    We used the open-source version.

    Technical Support:

    We used the open-source version.

    Which solution did I use previously and why did I switch?

    We previously used rsync, which was not real-time.

    How was the initial setup?

    Initial setup was mostly intuitive (based on rsync).

    What about the implementation team?

    Implementation was in-house based on the open-source version.

    What was our ROI?

    Target was to achieve real-time service.

    Which other solutions did I evaluate?

    Before choosing this product, we did not evaluate other options.

    Disclosure: My company does not have a business relationship with this vendor other than being a customer.
    PeerSpot user