I work in a project where I build data pipelines using Azure Data Factory. I ingest data from on-premises to Azure Data Lake. After that, I perform transformations using Databricks notebooks and Spark, building the Databricks bronze, silver, and gold layers. We export reports from the gold layer.
Senior Data Engineer at a computer software company with 1,001-5,000 employees
Enhancing data integration and processing across cloud services with seamless transformations
Pros and Cons
- "It helps integrate data science and machine learning capabilities."
- "Performance could be improved."
What is our primary use case?
How has it helped my organization?
Recently, we started using Databricks in our organization. It helps integrate data science and machine learning capabilities.
What is most valuable?
The Unity Catalog is a central governance for all data around the workspaces, and also Databricks' integration capabilities with cloud services like Azure Event Hub and Azure Data Factory. It is user-friendly for data processing, and Spark is a strong language for big data processing.
What needs improvement?
Performance could be improved. It is crucial to check coding, configure Spark correctly, implement caching, and monitor performance metrics to enhance performance.
Buyer's Guide
Databricks
June 2025

Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.
856,873 professionals have used our research since 2012.
For how long have I used the solution?
I have used Databricks for over two years.
What do I think about the stability of the solution?
I would rate stability as eight out of ten. It is quite stable.
What do I think about the scalability of the solution?
Databricks is perfect for scalability. It is easy to scale clusters.
How are customer service and support?
I haven't faced any issues requiring customer support, so I don't have experience with their customer support.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We used Informatica before, which is perfect for data management solutions. We started using Databricks for its capabilities in data science and machine learning.
How was the initial setup?
I would rate the initial setup as nine out of ten. It is quite easy for someone experienced with Spark.
What's my experience with pricing, setup cost, and licensing?
For my company, it's okay to upgrade to Databricks because it's comparable in price to Informatica. It is not considered expensive for the company.
Which other solutions did I evaluate?
For machine learning, I used Python and its libraries manually. Prior to Databricks, there was no special tool used for these purposes.
What other advice do I have?
If a company focuses on data science and machine learning, I recommend using Databricks. It's a great solution in this field. For data management needs, Informatica is advantageous due to its comprehensive tools.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Last updated: Nov 6, 2024
Flag as inappropriate
Experiencing smooth performance and cost advantages over previous tools
Pros and Cons
- "Databricks is definitely a very stable product and reliable."
- "My experience with the pricing and licensing model is that it remains relatively expensive. Though it's less expensive than AWS, we still need a more cost-effective solution."
What is our primary use case?
The use case for Databricks is that we use the clustering for high big data processing within the cluster.
What is most valuable?
I think it is difficult to determine which feature of Databricks I enjoy the most since there are many valuable features.
What's valuable about Databricks to my organization is that it is more cost-effective and provides better performance than the current AWS tools and services they offer.
What needs improvement?
I am uncertain about specific improvements for Databricks.
It would be beneficial to make Databricks even more cost-effective.
For how long have I used the solution?
I have been using Databricks for two years.
What do I think about the stability of the solution?
My experience with Databricks has been smooth, and I haven't encountered any issues.
Databricks is definitely a very stable product and reliable.
How are customer service and support?
I have not used Databricks customer service or support.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
Before Databricks, I used Batch processing, Fargate, and possibly Kubernetes.
I switched from my previous solutions because they were either too expensive or too difficult to configure.
Which other solutions did I evaluate?
I have considered other solutions besides Databricks, such as Snowflake, but we haven't explored it extensively yet.
We are still early in our Snowflake experience, so we don't know the pros and cons compared to Databricks.
What other advice do I have?
My deployment model for Databricks is limited as I'm not a heavy user.
I am not the person who purchased Databricks, but it was possibly acquired through the AWS Marketplace.
I may not have utilized Databricks machine learning capabilities.
My experience with the pricing and licensing model is that it remains relatively expensive. Though it's less expensive than AWS, we still need a more cost-effective solution.
I would rate Databricks overall a nine out of ten.
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Last updated: May 28, 2025
Flag as inappropriateBuyer's Guide
Databricks
June 2025

Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: June 2025.
856,873 professionals have used our research since 2012.
Data Analyst at Allianz
An easy to setup tool that provides its users with an insight into the metadata of the data they process
Pros and Cons
- "The initial setup phase of Databricks was good."
- "Scalability is an area with certain shortcomings. The solution's scalability needs improvement."
What is our primary use case?
My company uses Databricks to process real-time and batch data with its streaming analytics part. We use Databricks' Unified Data Analytics Platform, for which we have Azure as a solution to bring the unified architecture on top of that to handle the streaming load for our platform.
What is most valuable?
The most valuable feature of the solution stems from the fact that it is quite fast, especially regarding features like its computation and atomicity parts of reading data on any solution. We have a storage account, and we can read the data on the go and use that since we now have the unity catalog in Databricks, which is quite good for giving you an insight into the metadata of the data you're going to process. There are a lot of things that are quite nice with Databricks.
What needs improvement?
Scalability is an area with certain shortcomings. The solution's scalability needs improvement.
For how long have I used the solution?
I have been using Databricks for a few years. I use the solution's latest version. Though currently my company is a user of the solution, we are planning to enter into a partnership with Databricks.
What do I think about the stability of the solution?
It is a stable solution. Stability-wise, I rate the solution an eight to nine out of ten.
What do I think about the scalability of the solution?
It is a scalable solution. Scalability-wise, I rate the solution an eight to nine out of ten.
My company has a team of 50 to 60 people who use the solution.
How are customer service and support?
Sometimes, my company does need support from the technical team of Databricks. The technical team of Databricks has been good and helpful. I rate the technical support an eight out of ten.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup phase of Databricks was good. You can spin up clusters and integrate those with DevOps as well. Databricks it's quite nice owing to its user-friendly UI, DPP, and workspaces.
The solution is deployed on the cloud.
The time taken for the deployment depends on the workload.
What's my experience with pricing, setup cost, and licensing?
I cannot judge whether the product is expensive or cheap since I am unaware of the prices of the other products, which are competitors of Databricks. The licensing costs of Databricks depend on how many licenses we need, depending on which Databricks provides a lot of discounts.
What other advice do I have?
It is a state-of-the-art product revolutionizing data analytics and machine learning workspaces. Databricks are a complete solution when it comes to working with data.
I rate the overall product an eight out of ten.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Chief Executive Officer at dotFIT, LLC
A powerful solution that is easily integrated into a variety of platforms
Pros and Cons
- "It's very simple to use Databricks Apache Spark."
- "I would like more integration with SQL for using data in different workspaces."
What is our primary use case?
I am a Databricks service partner, and my customers use Azure Databricks and Data Factory.
What is most valuable?
It's very simple to use Databricks Apache Spark. It's really good for parallel execution to scale up the workload. In this context, the usage is more about virtual machines.
Using meta-stores like Hive was optional, and the solution is good for data science use cases. With the Authenticator Log, Databricks is good for data transformation and BI usage. We have a platform.
What needs improvement?
I would like more integration with SQL for using data in different workspaces. We use the user interface for some functionalities, while for others, we have to use SQL to create data sets and grant permissions. For example, when creating a cluster, we have to create it with some API or user interface. Creating a cluster with some properties using SQL grants the possibility of using SQL syntax. Integration with SQL will make Databricks easier to use by people who have experience with databases like Lakehouse, and they would be able to use the data lake and BI. More integration will help have one point of view for everyone using SQL syntax.
Integration with Kubernetes could also be good for minimizing the price because you can use Kubernetes instead of virtual machines. But that won't be easy.
For how long have I used the solution?
I have worked with the solution for four or five years, with some experience since 2016.
What do I think about the stability of the solution?
The solution is stable. The only problem with stability would be that people are not using it efficiently.
What do I think about the scalability of the solution?
The solution is good for scalability.
How was the initial setup?
When we have administration experience, the solution is not difficult to deploy. Technically, however, it's difficult because governance is more complex. For example, I have two warehouses on Databricks, which are clusters in this workspace, and we have to switch from workspace to workspace to have all this information. There is a system table that has all this, but I don't know if everyone can use these tables.
What's my experience with pricing, setup cost, and licensing?
Databricks are not costly when compared with other solutions' prices.
Which other solutions did I evaluate?
Databricks's functionalities are as good as solutions like Snowflake, BigQuery, and Redshift.
What other advice do I have?
People sometimes do not use the solution efficiently. They misunderstand databases, the usage of tables, and the performance. Many data engineers are very junior and don't have skills in that. Stability is more a customer problem than a problem with the product itself. One possible problem with the product is that there's no method to pause the usage of something. For example, we have to use the meta server or the data catalog in Synapse. But in Databricks, we have a choice to use a catalog or not, or Hive, which is always integrated, but we have to choose whether to use it or not. Many customers directly use the passes on Databricks, which causes performance and governance problems.
I can offer a lot of advice on Databricks, and one is to use meta stores like Unity Catalog or Hive Metastore. For incoming use cases, it's better to use Unity Catalog.
I rate Databricks a nine out of ten.
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Consulting Architect at a computer software company with 10,001+ employees
Ahead of the competition in building data ecosystems, but needs to improve ease-of-use
Pros and Cons
- "A very valuable feature is the data processing, and the solution is specifically good at using the Spark ecosystem."
- "Generative AI is catching up in areas like data governance and enterprise flavor. Hence, these are places where Databricks has to be faster."
What is our primary use case?
I worked with Databricks pretty recently. The particular design processes involved in Databricks were also a part of that specific design/architectural process.
We have used the solution for the overall data foundation ecosystem for processing and storage on a Delta format. We have also seen use cases where we were trying to establish advanced analytics models and data sharing where we leverage the Delta Sharing capabilities from Databricks.
What is most valuable?
A very valuable feature is the data processing, and the solution is specifically good at using the Spark ecosystem.
What needs improvement?
There are some aspects of Databricks, like generative AI, where they are positioning things like DALL-E. They're a little bit late to the game, but I think there are some things that they are working on. Generative AI is catching up in areas like data governance and enterprise flavor. Hence, these are places where Databricks has to be faster, and even though they are fast, I'm not sure how they'll catch up and get adopted because there are strong players in the market.
Databricks is coming up with a few good things in terms of integration. But I have to put one point forward that covers multiple aspects, which is the ease of use for the end user while operating this particular tool. For example, a tool like ADS gives you a GUI-based development, which is good for the end user who does development or maintenance. Looking at the complexities of data integration, a GUI might not be easy, but Databricks should embrace something on the graphical user development front because it is currently notebook-driven. Also, in terms of accessing the data for the end user, Databricks has an SQL interface, similar to earlier tools like SQL Management Studio. Since people are mostly comfortable with SSMS already or not, Databricks can build integration to known tools for data access, and that also helps, apart from what they're doing. I would like to see improvements with respect to user enablement, which is a good part of enterprise strategy. I would like to see their integration with a broader ecosystem of products. If you have to do data governance in tools like Microsoft Purview, it's manual and difficult. Now, I'm unsure if that momentum must be from Databricks or Microsoft. But it would be good if Databricks had some open interfaces to share metadata, which could be viewed in tools enabling data governance like Collibra, Purview, or Informatica. The improvement has to do with user and metadata integration for tools.
For how long have I used the solution?
I've worked with Databricks for over five or six years, but it's been on and off.
What do I think about the scalability of the solution?
The solution is scalable. In this particular ecosystem, there is no one else who can catch up with Databricks for now.
How are customer service and support?
Databricks' customer support is very good. They have a lot of ways in which they interact with vendors and service partners across the globe. They have periodic touch-up sessions with vendors, where their engineers answer your questions.
How was the initial setup?
The implementation is not challenging because the solution integrates well with the platforms on which they are established, whether it's Azure, AWS, or GCP. The solution is not difficult to set up, but you'd probably need a technical user to operate it.
It's the same story with maintenance, where you'd need a technically proficient person with programming knowledge to maintain it.
What other advice do I have?
Databricks integrates many enterprise processes because data processing and AIML are a small part of a larger ecosystem. Databricks has been a part of other platforms, and they are trying to establish their platform, which is a good direction.
Most of the capabilities of the underlying platform can be leveraged there. But the setup isn't difficult if the database lacks some capability, you can't find it in the database, or you're not comfortable with a certain feature in the database. It integrates well with the underlying platform. For example, with scheduling, let's say you are uncomfortable with workflow management. You can utilize integrations with EDA for any other tool and probably perform scheduling. Even if what you're trying to do is not easy, it is enabled with integration. Either they build a required feature in their tool later on, like a GUI, or you perform integrations to make the features possible.
We did evaluate licensing costs, but it had more to do with the Azure ecosystem pricing since whatever we are doing has more to do with Azure Databricks. Many optimizations are recommended, but we haven't exercised those for now. But considering that the processing is a bit more efficient, the overall price won't be much different from what it could be for any other similar component or technology. We haven't had specific discussions with Databricks' folks on pricing.
My advice to users who would like to start working with Databricks is that it is a good solution to work with for data integration and machine learning. Databricks is maturing for other use cases, so there are two points to be considered. One is that you need to evaluate how they will mature, which will be on a case-to-case basis. Second, how will it align with the overall platform story? There will be many overlapping aspects over there as Databricks expands its capabilities. In that case, it must be considered that if those capabilities overlap, how will the underlying platform vendors handle it? How would that interplay happen if many of Databricks' new capabilities align with Microsoft Fabric? That has to be very carefully considered. Otherwise, if you utilize those new capabilities, there might be a discontinuity where you cannot use Databricks because the platform does not support that.
If I specifically talk about Spark-based processing transformations, the data integration story, and advanced stability, I would rate Databricks around eight out of ten. However, with respect to new capabilities like cataloging, data governance, and security integration, I rate Databricks around five because it has to establish these features. And since Databricks integrates with platforms, we must see the interplay with the platforms' capabilities.
I overall rate Databricks a seven out of ten.
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Solution Architect at a insurance company with 10,001+ employees
A nice interface with good features for turning off clusters to save on computing
Pros and Cons
- "There are good features for turning off clusters."
- "It would be nice to have more guidance on integrations with ETLs and other data quality tools."
What is our primary use case?
Our company uses the solution for big data and as an interface for analytics.
We also create custom APIs to get data and provide SQL endpoints so users can access it over traditional tools like JDVC or ODBC.
We use the solution on AWS and Azure. The data lake is wide open for departmental use. We have ten departments and two or three people from each department access the solution.
How has it helped my organization?
The platform as a service allows us to ramp up a new database pretty fast. We deploy some of the infrastructure as a code. End users can access data immediately and connect with Power BI for reporting.
What is most valuable?
There are good features for turning off clusters. Basically, if we aren't using it, then it is turned off. When a user starts accessing, it starts up so we save on computing.
Our data lake team likes the interface very much because it is straightforward. Of, course you need to understand the different clusters when they are started.
There are nice features for matching the learning and analytics.
The security features allow us to integrate with the active directory and assign different people to different databases.
The solution has good a good interface with Python.
There is good integration with Azure so we can access the solution over the standard Azure interface and use the storage pro measure.
What needs improvement?
It would be nice to have more guidance on integrations with ETLs and other data quality tools. The solution is not really a product for ETL or data quality so we use other DBT tools.
For how long have I used the solution?
I have been using the solution for four months but my company has been using it for one year.
What do I think about the stability of the solution?
The solution is very stable with no issues so I rate stability a ten out of ten.
What do I think about the scalability of the solution?
The solution is scalable to the cluster size and Azure storage.
Scalability is rated an eight out of ten.
How are customer service and support?
I have not used technical support.
The company has regular calls with Databricks and they are pretty good but are more on the technical presale side.
Which solution did I use previously and why did I switch?
We previously used Azure's data lake product and possibly some Hortonworks.
How was the initial setup?
The setup is not easy but also is not too complicated. An infrastructure needs to be set up first. We use Azure storage or SQL S3 and create private end points.
This is maybe a little more complex or a bit different than other databases in the cloud. For a traditional setup, you need to also think about file systems and disks. Here, you just transform it into the storage and private end point.
The first setup might be a bit of a struggle until you learn and understand what is necessary.
What about the implementation team?
We implemented the solution in-house with support from Databricks. Two team members were involved in the implementation.
Three team members handle ongoing development and maintenance.
What's my experience with pricing, setup cost, and licensing?
The solution is affordable.
What other advice do I have?
The solution is pretty good because it uses Azure's data lake storage. It is basically the tool on top that provides the SQL interface and APIs for Python. I like the solution because it enables people to work with it.
I rate the solution a nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Data Architect at Three Ireland (Hutchison) - Infrastructure
Processes large data for data science and data analytics purposes
Pros and Cons
- "Specifically for data science and data analytics purposes, it can handle large amounts of data in less time. I can compare it with Teradata. If a job takes five hours with Teradata databases, Databricks can complete it in around three to three and a half hours."
- "There is room for improvement in visualization."
What is our primary use case?
It's mainly used for data science, data analytics, visualization, and industrial analytics.
What is most valuable?
Specifically for data science and data analytics purposes, it can handle large amounts of data in less time. I can compare it with Teradata. If a job takes five hours with Teradata databases, Databricks can complete it in around three to three and a half hours.
So that's why it's quite convenient to use for data science, for training machine learning models. By using more computing power, you can make it even faster.
What needs improvement?
There is room for improvement in visualization.
For how long have I used the solution?
I used it for two years. I worked with the latest update.
What do I think about the stability of the solution?
I would rate the stability a nine out of ten. I didn't face performance drops.
What do I think about the scalability of the solution?
I would rate the scalability an eight out of ten.
How are customer service and support?
Databrick's support is great. If we need any support, they are very quick with it. And they genuinely want you to use Databricks. So, whatever we ask them, they come up with multiple solutions to problem statements. That's really good.
Overall, the customer service and support are very good.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
I personally prefer using Databricks. However, we also considered using Snowflake, but the pricing was different. It's price per query.
So, as per your storage, a data scientist or a data analytics team needs to query again and again, which does not suit a data-heavy organization.
What was our ROI?
It's a good return on investment for Databricks from a delivery perspective. Delivered multiple dashboards. So, it's quite a good return on investment. And being a small organization, everyone can use Databricks, and cost-wise, it's also good for small organizations.
Which other solutions did I evaluate?
If the company is a startup, Databricks might be suitable. If a big company needs a lot of storage, Teradata might be best for them. It depends on the situation.
What other advice do I have?
Overall, I would rate the solution a eight out of ten. I would definitely recommend this solution for small organizations.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Business Architect at YASH Technologies
Very quick run time but there are some limitations for legacy integrations
Pros and Cons
- "The solution is an impressive tool for data migration and integration."
- "The solution has some scalability and integration limitations when consolidating legacy systems."
What is our primary use case?
Our company uses the solution for series-based and panel-based migrations. We collect and store user requirements, use apps to fetch data, and provide customers with better data for business reports. There are 30 to 40 users in our company.
What is most valuable?
The solution is an impressive tool for data migration and integration.
The run time is very quick.
What needs improvement?
The solution has some scalability and integration limitations when consolidating legacy systems.
For how long have I used the solution?
I have been using the solution for two years.
What do I think about the stability of the solution?
The solution is stable.
What do I think about the scalability of the solution?
It is not really scalability but more about the combination of the structure, consolidation, and different formats we can split and merge. We do a lot of things while storing the target operational model. Snowflake is more flexible and scalable in that regard.
How are customer service and support?
We have contacted technical support a lot about replicating values in PDF files. So far, they have not been able to provide a viable solution.
How was the initial setup?
The setup is of average difficulty but tougher than Snowflake.
Deployment is easy and run time is quick.
What about the implementation team?
We implemented the solution in-house.
One resource manages services for end-to-end monitoring and maintenance activities.
What's my experience with pricing, setup cost, and licensing?
The solution is based on a licensing model. Updates occur automatically by the task base.
Which other solutions did I evaluate?
Snowflake is quite impressive in comparison to the solution because there is flexibility in the way you consolidate. In contrast, the solution has some scalability and integration limitations when consolidating legacy systems. Tool wise, Snowflake is easy from the technical perspective because connectors are included.
We are evaluating options for one particular use case. The customer wants to replicate values from PDFs and enter them in the data model. We contacted the solution's technical support but do not yet have a viable answer. There are gaps in what we do and how we capture. The only option right now is for the customer to manually upload values that we integrate using Synapse to consolidate report data. We haven't yet found another tool that maps to meet our customer's requirement.
What other advice do I have?
I rate the solution a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other
Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Updated: June 2025
Popular Comparisons
Teradata
KNIME
Microsoft Azure Machine Learning Studio
Amazon SageMaker
Alteryx
Dataiku
Confluent
Apache Kafka
IBM SPSS Statistics
Altair RapidMiner
Azure Stream Analytics
Vertica
Amazon Kinesis
Apache Flink
Dremio
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which do you prefer - Databricks or Azure Machine Learning Studio?
- How would you compare Databricks vs Amazon SageMaker?
- Which would you choose - Databricks or Azure Stream Analytics?
- Which product would you choose for a data science team: Databricks vs Dataiku?
- Which ETL or Data Integration tool goes the best with Amazon Redshift?
- What are the main differences between Data Lake and Data Warehouse?
- What are the benefits of having separate layers or a dedicated schema for each layer in ETL?
- What are the key reasons for choosing Snowflake as a data lake over other data lake solutions?
- Are there any general guidelines to allocate table space quota to different layers in ETL?
- What cloud data warehouse solution do you recommend?