We performed a comparison between Cloudera DataFlow and Databricks based on real PeerSpot user reviews.
Find out in this report how the two Streaming Analytics solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."DataFlow's performance is okay."
"This solution is very scalable and robust."
"The initial setup was not so difficult"
"The solution is built from Spark and has integration with MLflow, which is important for our use case."
"We like that this solution can handle a wide variety and velocity of data engineering, either in batch mode or real-time."
"Databricks is based on a Spark cluster and it is fast. Performance-wise, it is great."
"This solution offers a lake house data concept that we have found exciting. We are able to have a large amount of data in a data lake and can manage all relational activities."
"Imageflow is a visual tool that helps make it easier for business people to understand complex workflows."
"We can scale the product."
"Databricks is hosted on the cloud. It is very easy to collaborate with other team members who are working on it. It is production-ready code, and scheduling the jobs is easy."
"There are good features for turning off clusters."
"Although their workflow is pretty neat, it still requires a lot of transformation coding; especially when it comes to Python and other demanding programming languages."
"It is not easy to use the R language. Though I don't know if it's possible, I believe it is possible, but it is not the best language for machine learning."
"It's an outdated legacy product that doesn't meet the needs of modern data analysts and scientists."
"I'm not the guy that I'm working with Databricks on a daily basis. I'm on the management team. However, my team tells me there are limitations with streaming events. The connectors work with a small set of platforms. For example, we can work with Kafka, but if we want to move to an event-driven solution from AWS, we cannot do it. We cannot connect to all the streaming analytics platforms, so we are limited in choosing the best one."
"In the next release, I would like to see more optimization features."
"The ability to customize our own pipelines would enhance the product, similar to what's possible using ML files in Microsoft Azure DevOps."
"Databricks doesn't offer the use of Python scripts by itself and is not connected to GitHub repositories or anything similar. This is something that is missing. if they could integrate with Git tools it would be an advantage."
"This solution only supports queries in SQL and Python, which is a bit limiting."
"In the future, I would like to see Data Lake support. That is something that I'm looking forward to."
"The solution could be improved by integrating it with data packets. Right now, the load tables provide a function, like team collaboration. Still, it's unclear as to if there's a function to create different branches and/or more branches. Our team had used data packets before, however, I feel it's difficult to integrate the current with the previous data packets."
"The product cannot be integrated with a popular coding IDE."
Cloudera DataFlow is ranked 13th in Streaming Analytics with 3 reviews while Databricks is ranked 1st in Streaming Analytics with 78 reviews. Cloudera DataFlow is rated 6.6, while Databricks is rated 8.2. The top reviewer of Cloudera DataFlow writes "A scalable and robust platform for analyzing data". On the other hand, the top reviewer of Databricks writes "A nice interface with good features for turning off clusters to save on computing". Cloudera DataFlow is most compared with Confluent, Amazon MSK, Spring Cloud Data Flow, Informatica Data Engineering Streaming and Hortonworks Data Platform, whereas Databricks is most compared with Amazon SageMaker, Informatica PowerCenter, Dataiku Data Science Studio, Microsoft Azure Machine Learning Studio and Dremio. See our Cloudera DataFlow vs. Databricks report.
See our list of best Streaming Analytics vendors.
We monitor all Streaming Analytics reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.