We have different use cases for different clients. For example, a banking sector client wants to use Apache Kafka for fraud detection and messaging back to the client. For instance, if there's a fraudulent swipe of a credit or debit card, we stream near real-time data (usually three to five minutes old) into the platform. Then, we use a look-up model to relay the data to the messaging queue to the customers. This is just one use case.
We also have data science engineers who use Kafka to feed on the data (usually within the last five to seven minutes of transactions) to detect any fraudulent transactions for internal consumption of the bank. This is not for relaying back to the customer. This client is based in the Middle East, and they have some interesting use cases.
Apache Kafka OverviewUNIXBusinessApplicationPrice:
Apache Kafka Buyer's Guide
Download the Apache Kafka Buyer's Guide including reviews and more. Updated: June 2023
What is Apache Kafka?
Apache Kafka is a highly regarded open-source, distributed event streaming platform and Message Queue (MQ) software solution that is valued and trusted worldwide by many of the top fortune 100 companies. It is considered one of the most reliable Message Queue (MQ) software solutions available in the marketplace today.
Enterprise organizations rely on streaming platforms and MQ software solutions to process the continuous flow of high-performance data pipelines, mission-critical applications, and data integration. Apache Kafka makes it easy to process and distribute messages from one application to another from multiple environments with super-fast speeds and very high reliability.
Additionally, in place of the usual command line processes regarding administration and management tasks, Apache Kafka supplies five exemplary core APIs for both Scala and Java:
- Kafka Streams API can be used to facilitate stream processing applications and microservices. Input is seamlessly read from one or more topics and will initiate output to one or more topics, easily converting the input streams to output streams.
- Kafka Connect API enables users to develop and run reusable data import/export connectors that are able to read and write streams of events from external operating systems and applications, making integration with Apache Kafka simple and streamlined.
- Consumer API allows users to subscribe and read one or more topics and to process the stream of events produced to them.
- Admin API gives users the ability to examine and manage brokers, topics, and various other Kafka topics.
- Producer API using this core element, users are able to write and publish a stream of events to one or more Kafka topics.
Apache Kafka Benefits
Apache Kafka has many valuable benefits. Some of its most valuable benefits include:
- Load Shifting
- Scalability
- Decoupling
- High Throughput
- High Availability
- Safe Permanent Storage
- Excellent Integration Capabilities
- Large, Reliable, Open-Source Community
- Mission Critical
- Wide Array of Available Learning Opportunities
Not only is Apache Kafka a robust messaging queue it is also a tremendously durable and reliable streaming platform that is fully capable of securely delivering more than one million messages per second, which amounts to trillions of success delivered messages in one day.
Reviews from Real Users
“From my experience with Apache Kafka, one of the most notable advantages is its ability to maintain a comprehensive record of historical data that includes every update, alteration, and version of information, unlike a conventional relational database. This feature allows for seamless tracking and analysis of the progression and transformation of the data over time, enabling users to easily review and analyze the history of the information.” Dimitrios Z., Enterprise Architect at Smals vzw
“We are currently on a legacy version and have found that the latest version of Kafka has solved many of the issues we were facing, such as sequencing, memory management, and more. Additionally, the fact that it is open source is a major benefit.” Pratul S. Software Engineer at a financial services firm
“The solution has improved our functionality; it's one of the best streaming platforms I've used.” Sreekar N., Co-Founder at Attaika
Apache Kafka Customers
Uber, Netflix, Activision, Spotify, Slack, Pinterest
Apache Kafka Video
Apache Kafka Pricing Advice
What users are saying about Apache Kafka pricing:
Apache Kafka Reviews
Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
- Date
- Highest Rating
- Lowest Rating
- Review Length
Search:
Showingreviews based on the current filters. Reset all filters
Group Manager at a media company with 201-500 employees
Real-time processing and reliable for data integrity
Pros and Cons
- "Kafka can process messages in real-time, making it useful for applications that require near-instantaneous processing."
- "Data pulling and restart ability need improving."
What is our primary use case?
How has it helped my organization?
We are still in the cluster database phase. Based on the use cases captured during the advisory phase, there will be a mix of 40 to 60% of users. 40% will be internal data science and IT teams, while 60% will be end users. So, the total number of users we have seen is 25. Out of these, around 15 business users will make decisions based on reports generated by Kafka analytic data. The remaining users are internal, who analyze this data daily to identify more use cases from a predictive and AI perspective for the future banking domain.
Moreover, our current client is an enterprise business. It is a globally renowned bank that has entered Saudi.
What is most valuable?
One of the major features that we are currently exploring, which is coming from my previous experience as well, is a multiple PubSub model kind of architecture. We have one hub data, which is the IBM DB2 system that banks use for their daytime transaction tracking from OLTP systems. We want to use the data on different platforms. So, we are trying to use Kafka in a model where it will be a publisher onto multiple messaging queues. These different messaging queues belong to different business units, where we are segregating the data lake we are building into different domains. For example, HR data is becoming too sensitive, so they don't want to give it to any other businesses. We are working on a common publisher and multiple subscriber model, which I feel is much more easily implementable using Kafka.
The other part that we are trying to implement, and which is in its very near center stages, is to see if we can make it future-ready. Right now, in the Middle East, there are not many cloud subscribers like DCP AWS and Azure. It is all on-premise. But it'll be there just in the next two or three years. So, we are trying to see if we can have these Kafka models working from a future perspective wherein instead of dumping some of the data into a data lake, we can directly dump it into solutions like DCP BigQuery for real-time analytics. This is just for the use cases, which are for real-time analytics. This data will definitely also be there in the data lake as that is the intention of keeping it.
But using Kafka, we are trying to see if we can make these subscribers ready to use these DCP BigQuery platforms for real-time analytics. It's still in the remittance stages, but those are still use cases.
What needs improvement?
One of the major areas for improvement, which I have to check out, is their pulling mechanism. Sometimes, when the data volume is too huge, and I have a pulling period of, let's say, one minute, there can be issues due to technical glitches, data anomalies, or platform-related issues such as cluster restarts. These polling periods tend to stop messaging use, and the restart ability part needs to be improved, especially when data volumes are too high.
If there are obstructions due to technical glitches or platform issues, sometimes we have to manually clean up or clear the queue before it eventually gets sealed. It doesn't mean it doesn't get restarted on its own, but it takes too much time to catch up. At that point, one year ago, I couldn't find a solution to make it more agile in terms of catching up quickly and showing that it is real-time in case of any downtime.
This was one area where I couldn't find a solution when I connected with Cloudera and Apache. One of our messaging tools was sending a couple of million records. We found it tough when there were any cluster downtimes or issues with the subscribers consuming data.
For future releases, one feature I would like to see is a more robust solution in terms of restart ability. It should be able to handle platform issues and data issues and restart seamlessly. It should not cause a cascading effect if there is any downtime.
Another feature that would be helpful is if they could add monitoring features as they have for their other services. A UI where I can monitor the capacity of the managed queue and resources I need to utilize more to make it ready for future data volumes. It would be great to have analytics on the overall performance of Kafka to plan for data volumes and messaging use. Currently, we plan the cluster resources manually based on data volumes for Kafka. If they can have a UI for resource planning based on data volume, that could be a great addition.
Buyer's Guide
Apache Kafka
June 2023

Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: June 2023.
708,461 professionals have used our research since 2012.
For how long have I used the solution?
I have been using Apache Kafka for five years. In the current project, we're setting up a cluster. We'll be doing the service installations next week. It's a private cloud-based implementation, and I'm leading the end-to-end implementation. In my previous project, we mainly used Kafka for streaming real-time SAP data into the analytics platform for a technology client.
But for the current banking sector client, we're setting up a 58-node cluster and reserving six nodes for Kafka because we have a lot of streaming use cases.
What do I think about the stability of the solution?
I would rate the stability of Apache Kafka a six out of ten due to the polling period and high data volumes; there is a catch-up problem. If there is a five-minute downtime, it can have a cascading effect.
When it comes to data availability, that is, how available the data is on the messaging queue, I would rate it a little less due to the coding mechanism and data getting stuck when the data volumes are high. From the data availability perspective, I would rate it between six to seven. The major reason for this is the ransomware data that comes in day in and day out. If the resources are not allocated correctly to the Kafka messaging queue, sometimes it gets stuck. And once it is stuck, it can have a steady effect on catching up to the real-time data. Only because of this issue, I rate it between six to seven.
However, from an overall data security perspective and ensuring that the data is consistent across the system, I would rate it around nine. If your PubSub model is written correctly, you can be assured that the data will not be lost. It will either be in the messaging queues or your landing tables or staging tables, but it will not be lost, at least if you have written it correctly.
What do I think about the scalability of the solution?
I would rate the scalability of Apache Kafka somewhere around seven out of ten. I'm not going on the higher side because a lot of manual work is involved in upgrading Kafka. You have to estimate the overall capacity, not just in data but also in other use cases running on the same cluster. In the cloud, it is easier as you don't have to worry about the turnaround time of your cluster setup. But in an on-premise setup, you need to add more nodes, RAM resources, and storage based on the increasing data volume.
Also, it would help if you ensured that the streaming use cases running on Kafka are not impacting other use cases like batch or archival use cases. Because of this manual activity of overall estimation, I will still keep it somewhere around six to seven. But regarding scalability in terms of horizontal or vertical scalability from the data perspective, I feel more comfortable with Kafka compared to other available streaming solutions.
How are customer service and support?
I have noticed a drastic improvement in the last five to seven months. Last year, the turnaround time for certain cases was around 14 days unless you escalated. Issues used to take two, three, or even four times to follow up on. Even though the solutions provided were often resource upgradation solutions, which I felt were not always the best.
However, in the last month, I have seen Cloudera coming through with two to three days turnaround times, even for low-severity issues. I'm unsure if it is region-specific, but I assume it should be, as they have region-specific teams. Sometimes, however, you cannot always depend on them. The type of solutions provided are sometimes like hidden trials. When you work for bigger enterprises, you cannot always go with them because there is a cost associated.
For example, in one of my recent cases, we implemented the engine policy, a security setup on our cluster. We were stuck at some point and asked for technical support. They provided a solution that was just a patchwork. When we did our analysis and went to the bank security team to review it, we found out that their solution was inadequate. They told us to set up role-based access control through Ranger, where AD users should be synced with the Ranger, and access control policies should be set up. However, they provided only for the local range level if you have Linux users. That is not a solution because, at the enterprise level, everything is integrated with AD and authenticated by AD.
I would rate Cloudera's support on a scale of five to six. But from the turnaround time, there has been an improvement from last year. We used to wait two to three days for critical solutions, but now it is much better. I used to work for the US region in my previous project, and the turnaround time was not as expected. It all depends on the licensing, and if you have a premium license from Cloudera, they assign a professional services guide to your project, and you get better support. If you do not have a premium license, you have to go through the process of rating the cases and wait for their support team to come. Overall, it is not at par with them if you compare it with solutions like Azure DCP and others.
How would you rate customer service and support?
Neutral
How was the initial setup?
The initial two months were for capacity estimation, where we worked with the client's different business teams to understand the data volumes and use cases. Then, the next four to five months went into procurement, where we had to work with infrastructure teams and vendors to understand the servers and networks required for the cluster.
The actual cluster setup took us two months, and it was a little longer due to a shortage of expertise on the client's networking team. We had to handle everything ourselves since it was an on-premise setup with physical servers and network connections. Currently, we are in the security review phase, and once it completes, we will start implementing various use cases like task and batch processing, archival, etc.
If you see my experience from the Apache Kafka implementation and clusters as a perspective, I will rate the setup somewhere between seven to eight out of ten.
What about the implementation team?
We deployed the solution On-premises because it is a Middle East client. That is where my admin experience in the last two years has been too much. So even if I move to the cloud, it'll be much easier because I have seen cluster implementation from scratch and how it is done. So I have been involved in the very first stage of working with Cloudera on the sizing and then working on the actual infrastructure networking team on the implementation, working with the network team on doing all the network structuring, then setting up the cluster ourselves. I have a team of around seventeen people who am I getting here. So yeah, from that perspective, it is there, but what we are implementing is Cloudera private cloud-based as a solution, which is a future-ready solution for the cloud.
So once cloud services enter with least, especially Saudi Arabia, for example, the CPaaW as an Azure, our cluster will always be ready to be upgraded to the cloud because there's a private cloud-based solution on which we have the cluster. We can anytime add the cloud-native hosts and nodes onto our cluster. Also, at the same time, because it's a banking client, it has some restrictions in terms of geographies and all for the data to decide. The physical cluster provides a solution from the future AD perspective that once TCP, Azure, and AWS set up their data centers in Saudi Arabia, we can have some of our data nodes in the bank data center, plus we can have other sets of nodes or VMs in the cloud service provider's data center.
What was our ROI?
I have seen that the ROI is very good when implemented correctly and used for a period of time. I have seen, from a POC perspective, data getting churned in a couple of months, and the amount of insights generated was overwhelming.
I have also seen some critical decisions taken based on that data at an enterprise level, which earlier used to take years. Because of the time it used to take, the intention of doing those analytics used to lose its flavor. But now, people can make decisions in a few months based on this streaming analytics use case through Kafka. And they see, if those decisions had been taken earlier, they could have quickly gone on for four to five percent of their year-on-year profits.
However, too many things are involved because of the overall use case perspective, data perspective, underlying cluster sizing, and sourcing. One has to think from a holistic point of view, not just from a business point of view.
What's my experience with pricing, setup cost, and licensing?
I have experience in private cluster implementation. When you use Apache Kafka with Cloudera, the pricing is included in your Cloudera license. The pricing is based on the number of nodes, the storage cost, and other components. As part of this license, Kafka is one of the solutions offered. When you compare it with OnCloud, if you don't have a good volume of data and use cases, your benefits realization will not be there, as the initial cost of setting up the cluster and bringing up the license can be as much as $760k for a small cluster of ten to twenty nodes. You need at least 20-30 GBs of data and use cases before utilizing and profiting from the Teradata license and cloud data. Kafka is just one piece of it.
When it comes to the cloud, the pricing also goes at the solution level so that you can compare it at the Kafka level. Still, I don't have much information on that from where I am currently implementing the solution. After we did the cost-benefit analysis, we only opted for the solution. We realized that by bringing in Cloudera along with Kafka, we would be able to replace two or three existing systems, including Teradata, Oracle, Informatica, and IBM Datastage. Only then were we able to realize the benefit for the bank. Otherwise, Cloudera would be much more expensive, especially in the short term. With distributed computing, the concept of Delta Lake is coming in, and IDBMS systems like Teradata and distributed systems like data lakes will coexist. Not all use cases will be solved, but cloud solutions like Azure come as a package, and you need not worry about having different physical systems in your enterprise to take care of. That's where I think the cost-benefit analysis from a data perspective becomes too important.
At the end of the day, we bring in big data systems only when the data volumes are high. When the data volumes are low, the cost-benefit analysis can easily show that systems like Oracle or Teradata can run it just fine.
What other advice do I have?
From an architecture and solution design perspective, I would say that before going for streaming solutions, we should analyze the data, which might be old, and decide if it's a streaming use case or not. Often, people think it's a streaming use case, but when they perform analytics on top of it, they realize they can't do a month-to-date or year-to-date analysis. So, it's essential to think again from the data basics perspective before going to Kafka.
Overall, from the product and solution perspective, I would rate it a nine based on my personal use of data.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Implementor
Last updated: Apr 26, 2023
Flag as inappropriateEnterprise Architect at Smals vzw
Effective event sequencing, seamless system interactions, and beneficial data management
Pros and Cons
- "There are numerous possibilities that can be explored. While it may be challenging to fully comprehend the potential advantages, one key aspect is the ability to establish a proper sequence of events rather than simply dealing with a jumbled group of occurrences. These events possess their own timestamps, even if they were not initially provided with one, and are arranged in a chronological order that allows for a clear understanding of the progression of the events."
- "There have been some challenges with monitoring Apache Kafka, as there are currently only a few production-grade solutions available, which are all under enterprise license and therefore not easily accessible. The speaker has not had access to any of these solutions and has instead relied on tools, such as Dynatrace, which do not provide sufficient insight into the Apache Kafka system. While there are other tools available, they do not offer the same level of real-time data as enterprise solutions."
What is our primary use case?
Apache Kafka is used for more than only a messaging bus but also served as a database to store information. It functioned as a streamer, similar to ETL, to manipulate and transform events before migrating them to other systems for use. The database could also act as a cache. Apache Kafka is used as a database broker, streamer, and source of truth for multiple systems due to its ability to maintain events for at least 10 days. It provided both synchronous and asynchronous communication, making it a complex system that would be easier to understand through diagrams or sketches.
We use reactive frameworks.
How has it helped my organization?
From my experience with Apache Kafka, one of the most notable advantages is its ability to maintain a comprehensive record of historical data that includes every update, alteration, and version of information, unlike a conventional relational database. This feature allows for seamless tracking and analysis of the progression and transformation of the data over time, enabling users to easily review and analyze the history of the information.
The solution has the capability for various systems to effortlessly interact with one another without prior knowledge of their existence, current operational status, or specific configurations. By utilizing service buses and dynamic integration, data can be distributed across networks and retrieved in a way that is most suitable for each system's requirements. In addition, Apache Kafka allows for the modification of data to provide diverse clients, consumers, or observers with unique and varying data. The replication of data can produce multiple versions, and this data can be adjusted to fit various needs. With the use of probes, one can alter the behavior of the transformation process, thereby changing the way in which data is transformed and the output produced. Overall, working with Apache Kafka has brought about an array of benefits, enabling seamless system interactions and allowing for the customization and modification of data to meet individual requirements.
What is most valuable?
There are numerous possibilities that can be explored. While it may be challenging to fully comprehend the potential advantages, one key aspect is the ability to establish a proper sequence of events rather than simply dealing with a jumbled group of occurrences. These events possess their own timestamps, even if they were not initially provided with one, and are arranged in a chronological order that allows for a clear understanding of the progression of the events.
What needs improvement?
There have been some challenges with monitoring Apache Kafka, as there are currently only a few production-grade solutions available, which are all under enterprise license and therefore not easily accessible. The speaker has not had access to any of these solutions and has instead relied on tools, such as Dynatrace, which do not provide sufficient insight into the Apache Kafka system. While there are other tools available, they do not offer the same level of real-time data as enterprise solutions.
One additional area that I think could benefit from improvement is the deployment process on OpenShift. This particular deployment is quite challenging and requires the activation of certain security measures as well as integration with other systems. It's not a straightforward process and typically requires engineers who are highly skilled and have extensive experience with Apache Kafka to carry out these tasks. Therefore, I believe that there is a need for progress in this area, and some tools that can provide information, assistance, and help make the whole process easier would be greatly appreciated.
For how long have I used the solution?
I have been using Apache Kafka for approximately four years.
What do I think about the stability of the solution?
The solution is stable if you have set it up correctly.
What do I think about the scalability of the solution?
Apache Kafka is a scalable solution.
How are customer service and support?
I have not escalated any questions to technical support because Apache Kafka is an open-source system. However, Confluent and other companies sell support and enterprise solutions to make it more convenient and streamline the work. They offer tools, such as a monitoring tool with a visual interface, which provides a lot of information and buttons to press for correction or change without touching the code. Each of those buttons hypothetically could have helped the situation, but it is unclear what they do exactly, it is best to call the data center and ask. If you buy their service, you have access to all the enterprise comforts.
How was the initial setup?
Setting up Apache Kafka is, is not an easy task, especially when trying to containerize it and make it controllable. This is because Apache Kafka has its own distributed mechanism for staying alive, checking readiness, replicating, and scaling. Ensuring that it complies with Kubernetes or OpenShift Orchestrator requires careful attention, as there is a risk of two masters attempting to perform the same task and ultimately undoing each other's work.
In comparison to Kubernetes, OpenShift is a highly skilled and advanced implementation infrastructure that automatically manages and orchestrates all the steps required for an application setup. It operates at a higher level of abstraction and eliminates the need for manual operations that are required with Kubernetes. While Kubernetes can run an application with some pipeline and configuration, OpenShift takes care of everything from finding the required images to creating ports and connecting databases. Although manual changes can be made, it's not necessary as OpenShift offers a much more course-grained management approach.
What about the implementation team?
One skillful DevOps engineer can implement the solution.
What's my experience with pricing, setup cost, and licensing?
Apache Kafka is an open-source solution.
What other advice do I have?
The maintenance of Apache Kafka is crucial due to the complexity of the system with numerous microservices and systems communicating through Apache Kafka, requiring proper integration and configuration to prevent overloading and ensure a healthy cluster. The task is not easy and requires knowledge of the various adjustable parameters, as misadjusting even one of them can greatly slow down the cluster. For example, if the consumer group changes frequently, the messages must be regrouped and reassigned, causing significant delays. Therefore, configuring Apache Kafka correctly is essential to avoid high latency issues.
I would strongly suggest others give Apache Kafka a chance and explore the various advantages that it can offer, especially since it should not be perceived as a message bus or broker but rather an enterprise bus designed for data manipulation. It has the ability to transform data, store and reject it, and even maintain different versions of the same data simultaneously. Moreover, it operates on a pull mechanism rather than a push mechanism, which takes away the risk of losing data and places the responsibility for data loss on the consumer. On the other hand, it also ensures that the data is always available within the specified window and allows for easy replication of the past, which is extremely helpful in situations such as those involving a hacked bank database. With Apache Kafka, you can efficiently go back in time, obtain the required status and events, and make changes accordingly, without the need to go through each transaction separately. Thus, using this solution can make data management much more efficient and convenient.
I rate Apache Kafka an eight out of ten.
In order to improve its user-friendliness, engineer-friendliness, and DevOps-friendliness, the system must undertake various tasks, such as enhancing the overall operation and configuration, ensuring seamless integration with other systems, and adapting to security layers in a more comprehensive and generic manner. This will require significant efforts to make the system more functional, secure, and efficient.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Feb 11, 2023
Flag as inappropriateBuyer's Guide
Apache Kafka
June 2023

Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: June 2023.
708,461 professionals have used our research since 2012.
Lead Architect at a financial services firm with 1,001-5,000 employees
Good partition tolerance, message reliability, and API integration
Pros and Cons
- "The main advantage is increased reliability, particularly with regard to data and the speed with which messages are published to the other side."
- "One of the things I am mostly looking for is that once the message is picked up from Kafka, it should not be visible or able to be consumed by other applications, or something along those lines. That feature is not present, but it is not a limitation or anything of the sort; rather, it is a desirable feature. The next release should include a feature that prevents messages from being consumed by other applications once they are picked up by Kafka."
What is our primary use case?
We use it extensively in our data pushing, for analytics and all of this type of data that is pushed, rather than on a real-time and payment basis. However, we are using it for offline messages, pushing it for processing, and for heavy, heavy usage, rather than extensively using it for financial data.
What is most valuable?
The main advantage is increased reliability, particularly with regard to data and the speed with which messages are published to the other side.
The connectivity from the application is straightforward, as is the API integration.
These are some of the most valuable features of this solution.
In terms of partition tolerance, message reliability is also present, which is a very good feature from the customer's perspective.
What needs improvement?
The area for improvement in Kafka is difficult to say because it's a solid product that works well in its intended applications. And, we are looking for something that can be used as part of financial implementations, because we don't want too many messages to be delivered to the other side, which is one of the areas I am looking at as well.
One of the things I am mostly looking for is that once the message is picked up from Kafka, it should not be visible or able to be consumed by other applications, or something along those lines. That feature is not present, but it is not a limitation or anything of the sort; rather, it is a desirable feature.
The next release should include a feature that prevents messages from being consumed by other applications once they are picked up by Kafka.
Then there is message dependability because a message is of no use if cannot be consumed. Alternatively, if the message is consumed but not committed, it should not be recorded in the Kafka queues. It should be because that is one of the features that is existing in MQs consistently provide: if the message is not committed, it will be committed back to the queues.
I have not seen that in Kafka.
For how long have I used the solution?
We have been using Apache Kafka for approximately three years in the organization.
I believe we are working with version 10. Confluent Kafka is what we are using.
What do I think about the stability of the solution?
It's a stable solution. Once completed, it is a very stable solution.
What do I think about the scalability of the solution?
The scalability is very good. It is scalable horizontally rather than vertically.
It can scale up to any level horizontally. However, if the message, once used horizontally scalable, cannot be shrunk once the requirement is reduced, some process is actually taking place. That is one thing that is lacking.
I believe there are approximately 10 to 15 people who use it.
This is being used by the data migration, data team, data analytical team, and data engineer. It's being used by all application architects who are just looking into it, as well as middleware integrators and middleware application integrators.
We have big plans to increase the use of various other innovations and stuff like that. We are using it in relation to data activities.
Also, we are only planning to use the financial part for publishing it, subscribing, and publishing a pop-up model for various use cases.
How are customer service and support?
Apache usually has a community deployment. If you use Apache or any other software, you will usually receive community support. Otherwise, some companies are taking it and beginning to process it. For example, in Kafka, there is a version of Confluent that they use and support. Or, as we call it, the Oracle Big Data platform.
It will be included with Hadoop, Spark, and other similar technologies. That is coming as, one of the back software packages that are part of that offering, and it is supported by Oracle. Depending on the type of open source, there are various types of support available. Other than the community, we will not receive assistance. Otherwise, it's free enterprise, and we can take it from Confluent or other vendors who offer similar products.
Which solution did I use previously and why did I switch?
Prior to implementing this solution, we were not using another solution. We have been using, Kafka from the beginning with regard to these use cases. However, we are using other queuing solutions, such as MQ, ActiveMQ, IBM IQ, and Q, but the use cases are different. This is primarily due to the large volume, faster processing, and other benefits of using Kafka.
How was the initial setup?
It is not deployed on-premises.
We use Kafka as part of the OCI Oracle Cloud platform and the Oracle Big Data platform because Kafka is included.
The Apache Kafka setup will take some time because it is not simple, and we have a lot of other components to install. It's fine because we needed all the plugins and other things for the simple implementations, but the containers' implementation is simple. The only difference is that when it comes to Zookeeper, there are a lot of supporting applications running on top of it, such as Zookeeper. As part of their area, Apache Kafka is running on top of Zookeeper. What do they think? As part of their... manageability, the Kafka area, and Apache Zookeeper. As a result, everything must be removed. And it will be preferable if the implementation is simple. I believe Confluent is doing this, but we have not yet begun.
The deployment, and configuration, will take one hour to complete. However, it is also dependent on the fact that you require a large number of configurations, which we have.
What about the implementation team?
The deployment was completed in-house.
Currently, there is a team of three to maintain this solution. There are application support personnel in charge of access control.
What's my experience with pricing, setup cost, and licensing?
It will be included in the Oracle-specific platform. It is approximately $600,000 USD.
What other advice do I have?
When it comes to Apache Kafka, they must understand how it works and what its internals are. There could be numerous challenges associated with the product and its entire life cycle. You will have to have a good understanding and knowledge of the configuration. You will need a technical person who is knowledgeable in Kafka which will be an advantage and on an ongoing life partner.
It's a very good solution, I would rate Apache Kafka a nine out of ten.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Sep 26, 2022
Flag as inappropriateSoftware Engineer at a financial services firm with 10,001+ employees
Open-source, stable, and scalable
Pros and Cons
- "The use of Kafka's logging mechanism has been extremely beneficial for us, as it allows us to sequence messages, track pointers, and manage memory without having to create multiple copies."
- "There is a lot of information available for the solution and it can be overwhelming to sort through."
What is our primary use case?
We have multiple use cases for our Kafka system. One is Kafka Connect, which is used to facilitate communication between different regions with Grid Deal. Another is to distribute events and projects to multiple downstream. We publish all the messages to Kafka and other listeners subscribe and write them to different MQs. Lastly, Kafka Connect is used especially for inter-application communication.
How has it helped my organization?
We had been using a lot of expensive licenses earlier, such as SOLEIL, as well as some legacy versions, which were not only costly but also caused memory issues and required highly technical personnel to manage. This posed a huge challenge in terms of resourcing and cost, and it simply wasn't worth investing more in. However, Kafka was comparatively free as it was open source, and we were able to build our own monitoring system on top of it. Kafka is an open-source platform that allows us to develop modern solutions with relative ease. Additionally, there are many resources available in the market to quickly train personnel to work with this platform. Kafka is user-friendly and does not require an extensive learning curve, unlike other tools. Furthermore, the configuration is straightforward. All in all, Kafka provides us with a great platform to build upon with minimal effort.
What is most valuable?
The use of Kafka's logging mechanism has been extremely beneficial for us, as it allows us to sequence messages, track pointers, and manage memory without having to create multiple copies. We are currently on a legacy version and have found that the latest version of Kafka has solved many of the issues we were facing, such as sequencing, memory management, and more. Additionally, the fact that it is open source is a major benefit.
What needs improvement?
Multiple people have constructed conflict resolution with successful solutions on top of open-source platforms. Unfortunately, open source does not have the monitoring and capabilities these solutions offer, so organizations must create their own. Investing in these solutions may be beneficial for many companies, who prefer to use open-source options.
There is a lot of information available for the solution and it can be overwhelming to sort through. The solution can improve by including user-friendly documentation.
For how long have I used the solution?
I have been using the solution for four years.
What do I think about the stability of the solution?
We have not experienced any issues with the stability of the solution. We had some issues with Grid Gain and Kafka Connect, but we believe it was more of an issue on Grid Gain's side since they informed us of a bug. Our result has been that we have not encountered many issues on the Kafka side.
What do I think about the scalability of the solution?
We use the solution in the distributed mode in multiple regions – the US, London, and Hong Kong. We have increased the number of nodes to ensure it is available to us at all times.
I give the scalability an eight out of ten.
We have around 600 people within my team using the solution.
How was the initial setup?
The initial setup was relatively easy for us since we already had Zookeeper and the necessary setup in place. We also had good knowledge of Kafka. Therefore, it was not a difficult challenge. In general, I believe that it is manageable. There are benefits and the setup is not overly complex.
Our company has implemented Ship, making our lives easier when it comes to changes or version updates. We can package everything in one place and deploy it with Ship, then implement the virtual number with a minimum of 50 changes.
Deployment time depends on our location and the task at hand. Initially, there is a lot of setup and configuration that must be done, but this can become easier with experience. Nowadays, the process is not too difficult, as all the version numbers and conflict files are already in place. However, if this is a new task for us, it may take some time to figure out all the configurations.
One person was dedicated to deploying Kafka. This person got help from our release team, who had already set up Zookeeper and other necessary components.
What about the implementation team?
The implementation was completed in-house.
What other advice do I have?
I give the solution an eight out of ten.
Maintaining Kafka, the open source, can be difficult without the proper version purchased or the right infrastructure in place. However, once the initial setup is complete, it is relatively simple to maintain. The open-source version of Kafka is not a complete package, so additional maintenance may be required.
I strongly recommend reading the documentation for any issues because it is likely to contain the answer we are looking for. There is a lot of information provided that may not be immediately obvious, so take the time to explore thoroughly.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Feb 2, 2023
Flag as inappropriatePrincipal Technology Architect at a computer software company with 5,001-10,000 employees
Events and streaming are persistent, and multiple subscribers can consume the data
Pros and Cons
- "With Kafka, events and streaming are persistent, and multiple subscribers can consume the data. This is an advantage of Kafka compared to simple queue-based solutions."
- "Kafka's interface could also use some work. Some of our products are in C, and we don't have any libraries to use with C. From an interface perspective, we had a library from the readies. And we are streaming some of the products we built to readies. That is one of the requirements. It would be good to have those libraries available in a future release for our C++ clients or public libraries, so we can include them in our product and build on that."
What is our primary use case?
It's a combination of an on-premise and cloud deployment. We use AWS, and we have our offshore deployment that's on-premise for OpenShift, Red Hat, and Kafka. Red Hat provides managed services and everything. We use Kafka and a specific deployment where we deploy on our basic VMs and consume Kafka as well.
We publish or stream all our business events as well as some of the technical events. You stream it out to Kafka, and multiple consumers develop a different set of solutions. It could be reporting, analytics, or even some data persistence. Later, we used it to build a data lake solution. They all would be consuming the data or events we are streaming into Kafka.
What is most valuable?
With Kafka, events and streaming are persistent, and multiple subscribers can consume the data. This is an advantage of Kafka compared to simple queue-based solutions.
What needs improvement?
We are still on the production aspect, with our service provider or hyper-scalers providing the solutions. I would like to see some improvement on the HA and DR solutions, where everything is happening in real-time.
Kafka's interface could also use some work. Some of our products are in C, and we don't have any libraries to use with C. From an interface perspective, we had a library from the readies. And we are streaming some of the products we built to readies. That is one of the requirements. It would be good to have those libraries available in a future release for our C++ clients or public libraries, so we can include them in our product and build on that.
For how long have I used the solution?
We've been using Apache Kafka for the past two to three years.
What do I think about the stability of the solution?
Kafka is stable. It's a great product.
What do I think about the scalability of the solution?
We did some benchmarking, but we are still looking further to scale up some of the benchmarking and performances. So far, it meets all our business requirements. We are just developers, so everything goes to the clients, who will deploy it at their scale and use it for their end customers. So were are looking at it from a developer's perspective. Those who are developing the products are working on this.
How are customer service and support?
We haven't really contacted technical support, but some of our clients have subscribed to support from the vendors. We generally look for open-source solutions. From there, we try to figure out if there are any issues. There's a good online community where you can ask questions.
How was the initial setup?
We were able to deploy and use it with no problems for our use case. We didn't find it so complex. We work with so many applications, databases, Postgres, and so many other things, so we could manage it easily. We deployed Kafka in a few hours. We have an infrastructure team and DevOps. Those teams are pretty capable, and they've completely automated the whole deployment. It always takes time the first time you upgrade any application, not just Kafka. We might discover some issues, such as configuration, parameters, compatibility, etc. Once that becomes standard, it is stable, and then they only need to replicate it to the different environments or different developers groups. We have a sophisticated process.
What other advice do I have?
I rate Apache Kafka eight out of 10. There are so many products on the market, so my advice is to consider if Kafka suits your business requirements first. If it's suitable, the next step is to check whether all the technical requirements are met. If everything checks out, I would say that Kafka is a relatively stable, sound, and scalable product, so they can try it out.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Sr Technical Consultant at a tech services company with 1,001-5,000 employees
Effective stream API, useful consumer groups, and highly scalable
Pros and Cons
- "The most valuable features are the stream API, consumer groups, and the way that the scaling takes place."
- "would like to see real-time event-based consumption of messages rather than the traditional way through a loop. The traditional messaging system works by listing and looping with a small wait to check to see what the messages are. A push system is where you have something that is ready to receive a message and when the message comes in and hits the partition, it goes straight to the consumer versus the consumer having to pull. I believe this consumer approach is something they are working on and may come in an upcoming release. However, that is message consumption versus message listening."
What is our primary use case?
One of our clients needed to take events out of SAP to stream them through Apache Kafka while applying data enrichment before reaching the consumers.
How has it helped my organization?
The solution can handle more speed and has horizontal scalability for both messaging, but more specifically stream processing and data enrichment. By using this solution it can reduce the number of components required in the tech stack. For example, we were taking data events out of SAP and sending them to consumers without having to go through multiple processors that were outside of the KAFKA space. Additionally, we are using Kafka from GoldenGate to propagate database updates in real-time.
What is most valuable?
The most valuable features are the stream API, consumer groups, and the way that the scaling takes place.
What needs improvement?
I would like to see real-time event-based consumption of messages rather than the traditional way through a loop. The traditional messaging system works by listing and looping with a small wait to check to see what the messages are. A push system is where you have something that is ready to receive a message and when the message comes in and hits the partition, it goes straight to the consumer versus the consumer having to pull. I believe this consumer approach is something they are working on and may come in an upcoming release. However, that is message consumption versus message listening.
Confluent created the KSQL language, but they gave it to the open-source community. I would like to see KSQL be able to be used on raw data versus structured and semi-structured data.
For how long have I used the solution?
I have been using this solution for approximately one year.
What do I think about the stability of the solution?
The solution is stable.
What do I think about the scalability of the solution?
I have found the Apache Kafka to be highly scalable
How are customer service and technical support?
The project we were working on was open-source, we were using Confluent as support and they were great.
How was the initial setup?
Apache Kafka on AWS is a bit complex. There is a third-party company called Confluent and they have the support that makes their installation much easier, especially for the on-premise deployment. You install Apache Kafka alone it can be a little complex compared to other queuing messaging solutions.
The on-premise deployment takes approximately a few days. The cloud or hybrid deployments including all the permissions, typologies, firewalls, and networking configuration can take weeks for all the accessibility issues to be resolved. However, the delay could have been client-related and not necessarily the solution.
What about the implementation team?
We provide the implementation service.
What's my experience with pricing, setup cost, and licensing?
Apache Kafka is free. My clients were using Confluent which provides high-quality support and services, and it was relatively expensive for our client. There was a lot of back and forth on negotiating the price.
Confluent has an offering that has Cloud-Based pricing. There are different packages, prices, and capabilities. The highest level being the most expensive. AWS provides services to their market, for example, to have Kafka running. I do not know what the pricing is and I am fairly confident, Azure and GCP provide similar services.
What other advice do I have?
My advice to others wanting to implement this solution is to start with data streaming projects, not simple messaging projects because while it is very good at general-purpose messaging, it is more suited and geared for when you are using it as a streaming solution.
I rate Apache Kafka an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
CTO at Estrada & Consultores
Great scalability with a high throughput and a helpful online community
Pros and Cons
- "The solution is very easy to set up."
- "While the solution scales well and easily, you need to understand your future needs and prep for the peaks."
What is our primary use case?
We primarily use the solution for upstreaming messages with different payload for our applications ranging from iOT, Food delivery and patient monitoring.
For example for one solution we have a real-time location finding, whereby a customer for the food delivery solution wants to know, where his or her order is on a map. The delivery person's mobile phone would start publishing its location to Kafka, and then Kafka processes it, and then publishes it to subscribers, or, in this case, the customer. It allows them to see information in real-time almost instantly.
How has it helped my organization?
Apache Kafka has became our main component on almost all our distributed solutions. It has helped us to delivery fast distributing messages to our customer's applications.
What is most valuable?
The solution is good for publishing transactions for commercial solutions whereby a duplicate will not affect any part of the system.
The solution is very easy to set up.
The stability is very good.
There's an online community available that can help answer questions or troubleshoot problems.
The scalability of Kafka is very good.
It provides high throughput.
What needs improvement?
Kafka can allow for duplicates, which isn't as helpful in some of our scenarios. They need to work on their duplicate management capabilities but for now developers should ensure idempotent operations for such scenarios.
While the solution scales well and easily, you need to understand your future needs and prep for the peaks.
For how long have I used the solution?
I've been using the solution for four years so far.
What do I think about the stability of the solution?
The stability is excellent. There are no bugs or glitches. It doesn't crash or freeze. It's reliable.
What do I think about the scalability of the solution?
Scaling is not really a problem with Kafka. We have used Kubernetes clusters and it is working very well. It scales up and down, almost automatically almost unnoticeable to the consumers, based upon our configuration. Kafka is just one pod inside of our cluster that scales horizontally.
We have a couple of customers that also have vertical scaling, meaning that, there's more CPU, more memory available to the Kafka pod.
How are customer service and technical support?
For Kafka, we don't actually require support from the company. We usually have people experienced in-house and sometimes we just ask in the community.
How was the initial setup?
The initial setup is easy. The majority of the tools today are really very easy to configure and setup. Docker Containers and Kubernetes, actually, have made life easier for architects as well as developers.
Nowadays, you just install the container, and then you don't have to really manage the internals at libraries, OS levels, et cetera. You just run the container. Everything is containerized.
What's my experience with pricing, setup cost, and licensing?
Apache Kafka is OpenSource, you can set it up in your own Kubernetes cluster or subscribe to Kafka providers online as a service.
What other advice do I have?
New users should understand the product capabilities. Often, people will start putting their hands in new products without knowing the capabilities and the disadvantages in specific scenarios. In our case for example, We haven't used Kafka for financial transaction processing, for which we still use IBM MQ, but It really depends upon your knowledge and experience with the product. My advice is to understand the product very well, its pros and cons and work from there.
Finally I'd rate the solution at a nine out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
CEO at a comms service provider with 11-50 employees
Reliable for working with a huge amount of data and has many options for building applications on top of it
Pros and Cons
- "The high availability is valuable. It is robust, and we can rely on it for a huge amount of data."
- "The price for the enterprise version is quite high. It would be better to have a lower price."
What is our primary use case?
We deploy it for our customers. The main use case is related to log management and metrics because we are a partner of Elastic Stack, and we usually collect information through Kafka.
What is most valuable?
The high availability is valuable. It is robust, and we can rely on it for a huge amount of data.
The Kafka Streams capability is also valuable. We get many options to build applications on top of Kafka.
What needs improvement?
The price for the enterprise version is quite high. It would be better to have a lower price.
For how long have I used the solution?
I have been working with this solution for four or five years.
What do I think about the stability of the solution?
It is absolutely stable.
What do I think about the scalability of the solution?
It is very scalable. It is easy to scale it.
It doesn't matter how many users are using it. The licenses are calculated based on the number of nodes. It is not based on the number of users who are using it. We have between 10 to 20 nodes on average in an organization.
How are customer service and support?
It is quite good, but they don't speak Italian. In Italy, we have to provide support in the Italian language. It is a problem for customers to have support in English. This is the reason why we provide direct support to customers.
How was the initial setup?
I am into pre-sales and project management. I don't usually install Apache Kafka, but its basic installation seems quite simple.
Its deployment is usually quite short. Usually, we are able to deploy it in a few days, but data management and application development can take a few months.
What about the implementation team?
We have our own team to deploy it. We also take care of its maintenance. We have a team of five or six employees to provide 24/7 support to our customers.
What was our ROI?
It depends on the project. For log management projects, the ROI is not very quick, but we have other projects where we used Kafka for high-value applications, and the ROI was very quick. We got an ROI in a few months.
What's my experience with pricing, setup cost, and licensing?
The price for the enterprise version is quite high.
For on-premise, there is an annual fee, which starts at 60,000 euros, but it is usually higher than 100,000 euros. The cost for a project including the subscription is usually between 100,000 to 200,000 euros. The cost also depends on the level of support. There are two different levels of support.
What other advice do I have?
Kafka is a really good product. To be able to keep it running in the long term, you need to know very well how it works. You should have good knowledge about it. It isn't about just knowing how to install it because it is quite simple to install it. It is important to have the right knowledge and experience to do a good installation and let it run for a long period. You can also go for someone who has the right experience and knowledge.
We are very satisfied with Kafka. I would rate it an eight out of 10. It is not perfect, but it is a really good product.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.

Buyer's Guide
Download our free Apache Kafka Report and get advice and tips from experienced pros
sharing their opinions.
Updated: June 2023
Product Categories
Message Queue (MQ) SoftwarePopular Comparisons
IBM MQ
Amazon SQS
Red Hat AMQ
PubSub+ Event Broker
ActiveMQ
Anypoint MQ
IBM Event Streams
Amazon MQ
VMware RabbitMQ
Oracle Event Hub Cloud Service
PubSub+ Event Portal
Oracle Data Integrator (ODI)
Memphis
Aurea CX Messenger
Buyer's Guide
Download our free Apache Kafka Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which ETL tool would you recommend to populate data from OLTP to OLAP?
- What are the differences between Apache Kafka and IBM MQ?
- When evaluating Message Queue, what aspect do you think is the most important to look for?
- What Message Queue (MQ) Software do you recommend? Why?
- What is the best MQ software out there?
- What is MQ software?
The high availability is valuable. It is robust, and we can rely on it for a huge amount of data.