My usual use case for Databricks as an end-user mostly involves exporting data. This typically entails writing directly into a web interface to get the data out, so probably with Python.
Databricks offers a scalable, versatile platform that integrates seamlessly with Spark and multiple languages, supporting data engineering, machine learning, and analytics in a unified environment.



| Product | Mindshare (%) |
|---|---|
| Databricks | 9.7% |
| Snowflake | 15.1% |
| Teradata | 8.8% |
| Other | 66.4% |
| Title | Rating | Mindshare | Recommending | |
|---|---|---|---|---|
| Informatica Intelligent Data Management Cloud (IDMC) | 4.0 | N/A | 92% | 215 interviewsAdd to research |
| Teradata | 4.1 | 8.8% | 88% | 83 interviewsAdd to research |
| Company Size | Count |
|---|---|
| Small Business | 26 |
| Midsize Enterprise | 12 |
| Large Enterprise | 50 |
| Company Size | Count |
|---|---|
| Small Business | 681 |
| Midsize Enterprise | 382 |
| Large Enterprise | 1924 |
Databricks stands out for its scalability, ease of use, and powerful integration with Spark, multiple languages, and leading cloud services like Azure and AWS. It provides tools such as the Notebook for collaboration, Delta Lake for efficient data management, and Unity Catalog for data governance. While enhancing data engineering and machine learning workflows, it faces challenges in visualization and third-party integration, with pricing and user interface navigation being common concerns. Despite needing improvements in connectivity and documentation, it remains popular for tasks like real-time processing and data pipeline management.
What features make Databricks unique?
What benefits can users expect from Databricks?
In the tech industry, Databricks empowers teams to perform comprehensive data analytics, enabling them to conduct extensive ETL operations, run predictive modeling, and prepare data for SparkML. In retail, it supports real-time data processing and batch streaming, aiding in better decision-making. Enterprises across sectors leverage its capabilities for creating secure APIs and managing data lakes effectively.
Databricks was previously known as Databricks Unified Analytics, Databricks Unified Analytics Platform, Redash.
Elsevier, MyFitnessPal, Sharethrough, Automatic Labs, Celtra, Radius Intelligence, Yesware
| Author info | Rating | Review Summary |
|---|---|---|
| Governance And Engagement Lead | 3.5 | I value Databricks' Unity Catalog and Python for data export. However, our implementation is expensive and inefficient due to beginner mistakes, lack of experienced staff, and poor cloud integration. We haven't seen ROI yet despite the platform's potential. |
| Consultant at Nice Software Solutions | 4.5 | I found Databricks excellent for ETL pipelines and AI integration, drastically reducing processing time and costs with features like Unity Catalog. I recommend it, though more free learning resources for beginners would be beneficial. |
| Data Engineer at a engineering company with 1,001-5,000 employees | 4.0 | As a data engineer, I've used Databricks on Azure for three years, valuing its multilingual notebooks and parallel processing. However, frequent cluster failures are a significant weakness I've encountered, despite its cost-saving benefits. |
| Data Platform Architect at KELLANOVA | 3.5 | I use Databricks on AWS for AI/ML, finding its Unified Catalog, serverless computing, and scalability valuable. While initial setup was complex and pricing is high, it's an advanced platform I recommend for analytics, rating it 7-8/10. |
| Head CEO at bizmetric | 4.5 | I use Databricks as an excellent, all-encompassing platform for end-to-end ML and data engineering, leveraging features like Unity Catalog and MLflow. While it offers great scalability and cost-effectiveness, DBFS file path handling could be improved with better utilities. |
| Senior Data Engineer at a logistics company with 51-200 employees | 3.5 | I find Databricks a solid, all-in-one solution with good support, but managing costs with job clusters and dealing with disruptive patches and challenging migrations are significant issues, affecting operational stability. |
| Solution Architect at Mercedes-Benz AG | 4.5 | I find Databricks valuable for its collaborative notebooks and cost-saving features, especially for ETL and analytics. While API and model deployment could improve, particularly for LLMs, I rate it highly and haven't faced scalability issues. |
| Analista | 5.0 | I use Databricks primarily for SQL queries, finding its web-based, secure nature highly efficient and beneficial for collaboration. While I'd like it to be faster and support direct server queries, I appreciate its stability, scalability, and excellent support, leading to great ROI by saving time. |
| Data Engineer at a tech vendor with 1,001-5,000 employees | 4.5 | I use Databricks for stable, high big data processing, finding it more cost-effective and performant than AWS alternatives. Though very reliable and rated 9/10, I still hope for further cost reductions. |
| Data Engineer at CRAFT Tech | 4.0 | I highly recommend Databricks, finding it excellent for data lakehouses and AI/ML, especially with Delta Lake and Unity Catalog. Its stability, scalability, and easy setup are great, though dashboards need improvement. Overall, it's a top solution. |

My usual use case for Databricks as an end-user mostly involves exporting data. This typically entails writing directly into a web interface to get the data out, so probably with Python.
The most significant benefit Databricks has brought to my company is the Unity Catalog. Previously, with our data warehouse, we weren't able to track where sensitive data was. The Unity Catalog has been a big improvement, even though we haven't gotten the rest right.
The user interface is very useful, especially in writing directly into a web interface.
From my perspective, the ability to export data effectively and use Python within Databricks are key valuable features.
I believe we could improve Databricks integration with cloud service providers. The impact of our current integration has not been particularly good, and it's becoming very expensive for us. The inefficiencies in our implementation, such as not shutting down warehouses when they're not in use or reserving the right number of credits, have led to increased costs.
We made several beginner mistakes, such as not taking advantage of incremental loading and running overly complicated queries all the time. We should be using ETL tools to help us instead of doing it directly in Databricks. We need more experienced professionals to manage Databricks effectively, as it's not as forgiving as other platforms such as Snowflake.
I think introducing customer repositories would facilitate easier implementation with Databricks.
I have been working with Databricks for the last six months.
As a platform, Databricks is fine. However, our implementation isn't particularly reliable.
We've suffered from the lack of professionals with previous experience, which makes it difficult to dig ourselves out of the situation we've found ourselves in.
The scalability level of Databricks at the moment exceeds our needs. It's not a problem for us.
The sky's the limit with Databricks.
I have addressed technical support about our issue with Databricks. It was the team that engaged with them, and I believe our development teams also reach out for support, though I'm not sure what level of support they get.
Previously, when using Snowflake, we had customer reps who were really knowledgeable and helped us to avoid beginner mistakes. With Databricks, it seems we could have benefited from similar support. Our implementation team had no experience and made obvious mistakes. It may be that we opted not to have that support, but I believe we should have.
Positive
Before Databricks, I used SQL Server.
The big decision to switch from SQL Server to Databricks was motivated by the lack of auditing, lineage, and tracking sensitive data in SQL Server, along with a need for more flexibility.
I did not participate in the initial setup of Databricks.
We use a consultancy, Avanade, for our Databricks implementation. They had previously done a Databricks implementation for another part of our organization. Our implementation team lacked experience which resulted in several beginner mistakes.
So far, we're not measuring any return on investment, such as saving time, money, or resources with Databricks. We're still in the phase where our old system and the new system are running simultaneously, so everything is twice as expensive and much effort is doubled. We haven't progressed far enough yet to realize any ROI.
I believe that in terms of credits for Databricks, we're spending between £15,000 and £20,000 a month.
I think Databricks is priced correctly. If we managed our resources better, we wouldn't be paying anywhere near that amount. The issue is with our management of resources.
No other options were considered because we used the consultancy Avanade, who had done a previous Databricks implementation for another part of our organization. We used them to recreate our implementation.
I'm probably not the best person to discuss certain aspects of Databricks since I haven't explored it deeply and am not part of the team developing it.
We haven't utilized Databricks' machine learning capabilities.
From my company, data ingestion and transformation are done with Databricks, though I don't do it directly.
I don't use Databricks' features for managing data, such as data lake and warehouse operations.
Most of our current work with Databricks isn't really live yet, so measuring savings in time and money or identifying any return on investment isn't applicable right now.
I would rate this review a 7 overall.

My main use case for Databricks involves the pipelines and ETL processes that we are implementing. Following the Medallion architecture with Gold, Silver, and Bronze layers, we filter the data, perform transformations, and integrate AI. Databricks has made this process significantly easier.
I worked for an airline company where they experienced substantial delays in data processing. When a passenger booked a ticket, it took 20 to 25 minutes for the transaction to reflect in the system. Using Databricks, we compressed that time from 10 to 6 minutes initially and eventually reduced it to just a few seconds. After setting up all the pipelines and leveraging Databricks features to enhance and accelerate the process, this project became truly impactful and time-based, resulting in reduced processing time and ultimately increased profit for the airline company.
The best features Databricks offers are Unity Catalog, Databricks Workflow, Databricks AI, Agentic AI, and the automated pipelines that utilize AI. The AI models are very easy to create and deploy in just a few seconds. These are helpful and user-friendly tools.
I find myself using Unity Catalog most frequently because it provides a unified governance solution for all data and AI needs on Databricks, offering centralized access control, auditing, lineage, and data discovery capabilities across the platform. The main features include access control, security compliance standard models, built-in auditing, and lineage tracking. Most of my projects have involved integrating Unity Catalog into systems and providing overall security, including a migration project to transition to Unity Catalog.
The platform's unified data intelligence capabilities allow teams to analyze, manage, and activate data at scale, leading to faster time to insights, cleaner data pipelines, and significant savings on infrastructure and engineering efforts. Databricks eliminates data silos, accelerates the time to insight, empowers all data personnel, and provides built-in governance and security. It also supports AI and ML, which is an added advantage in today's AI-driven field.
Databricks already provides monthly updates and continuously works on delivering new features while enhancing existing ones. However, the platform could become easier to use. While instruction-led workshops are available, offering more free instructional workshops would allow a wider audience to access and learn about Databricks. Additionally, providing use cases would help beginners gain more knowledge and hands-on experience.
Regarding my experience, I was initially unfamiliar with the platform and had to conduct research and learn through various videos. I did find some instruction-led classes, but several of those required payment. The platform should provide more free resources to enable a broader audience to access and learn about Databricks. The platform itself is user-friendly and easy to use without complex issues, so I believe it does not need improvement in its core functionality. Rather, supporting aspects can be enhanced.
I have been working as a data engineer for four years. Initially, I was a software engineer, but my career has progressed as a data engineer over this four-year period.
Definitely. As I mentioned regarding my airline project, it was impactful because the cost was reduced by 60 to 70 percent. The company was initially using Azure Blob storage, and in Databricks, the cluster and associated infrastructure were cheaper than other platforms. This reduction in both time and money resulted in real-time impact and significant cost savings.
For advice for others considering Databricks, it is important to start by understanding its place in the data ecosystem and how it fits into your specific needs. Key points to consider include familiarizing yourself with Databricks, learning the basics, starting with data engineering, and incorporating ETL processes. You can then dive deeper into Databricks features such as notebooks, clusters, and jobs. Achieving certification enhances your skills validation. For best practices, it is critical to optimize performance and minimize complexity while continuously learning to stay competitive in the data field. Following these steps will be very beneficial for anyone pursuing a career as a data engineer and Databricks engineer.
Databricks is a truly essential platform for data engineering needs, and I recommend it to anyone looking to advance in the data engineering field. It is a very important platform and tool for every data engineer. I encourage everyone to learn and explore this product and to maximize its potential. I rate this product a 9 out of 10.

I am working as a data engineer at Fractal. On a daily basis, I work on Azure Cloud, and I use Databricks frequently. We have EDF pipelines and utilize Synapse for our daily tasks.
Databricks offers various courses that I can use, whether it's PySpark, Scala, or R. I can leverage all these courses in a single notebook, which is beneficial for clients as they can access various tools in one place whenever needed. This is quite significant.
I usually work with PySpark based on client requirements. After coding, I feed the Databricks notebooks into the ADF pipeline for updates. Databricks' capability to process data in parallel enhances data processing speed. Furthermore, I can connect our Databricks notebook directly with Power BI and other visualization tools like Qlik. Once we develop code, it allows us to transform raw data into visualizations for clients using analysis diagrams, which is very helpful.
As a data engineer, I see cluster failure in our Databricks user databases as a major issue. I am unsure why, however, our flow, typically involving three to four notebooks, sometimes leads to cluster failure. Despite attempts to identify the problem, there are times when the reason remains unclear. Adjusting features like worker nodes and node utilization during cluster creation could mitigate these failures.
I have been using the solution for three years now.
Cluster failure is one of the biggest weaknesses I notice in our Databricks.
Databricks is beneficial for cost-saving since clients I work for transitioned from AWS Cloud to Azure Cloud for this reason.
The initial setup is very straightforward for us.
I am not very aware of the pricing. We use three to four clusters in our project. Increasing the number or size of clusters, such as adding more workers, would result in higher costs. That's why we limit ourselves to four clusters for our business.
In terms of cost efficiency, it's very useful because our clients switched from AWS Cloud to Azure Databricks to save costs.
I would rate the overall product eight out of ten.
Everything is probably good as far as I have used it, but there's room for improvement in cluster integration. Enhancing cluster capabilities while keeping costs lower would be beneficial.
Neutral

I use Databricks for various purposes, including data engineering, MLOps, machine learning training and deployment, the entire ML cycle, and dashboards. It serves different purposes for different projects.
Unity Catalog is a feature I am currently using extensively. I am migrating many projects to Unity Catalog. MLflow, which I use for model registering and creating the lineage of models, is also valuable.
Additionally, Databricks serves as a single platform for conducting the entire end-to-end lifecycle of machine learning models or AI ops. I don't need to switch between various tools, making it an all-encompassing solution for development and research. I use the lake house and utilize features effectively.
There has been a significant evolution in databases. One area of improvement is the Databricks File System (DBFS), where command-line challenges arise when accessing files. Standardization of file paths on the system could help, as engineers sometimes struggle.
It would be beneficial to have utilities where code snippets are readily available. This would allow engineers to easily click a snippet and import it into the notebook, enabling quick modifications to variables or paths for fetching files, such as reading data from DBFS files. If I could right-click to copy absolute paths or to read files directly into a data frame, it would standardize and simplify the process.
I have used the solution for five years plus.
I would rate stability seven to eight out of ten.
I would rate scalability seven to eight out of ten.
I do not have any issues that require support. Many resources are available online.
Neutral
I use infrastructure as code on the cloud to deploy the infrastructure. I have all the Git repositories and code repositories for deploying the code and models in the workspace. My setup includes a shared workspace, shared clusters, and integration with Unity Catalog.
I have a team of 100 engineers working with me, and I head the Center of Excellence (COE).
I believe it is competitive across clouds. When it comes to big data processing, I prefer Databricks over other solutions. Cost-wise, it is very competitive. The setup process is straightforward, thanks to the use of Spark clusters. This allows for faster turnaround times with Databricks.
The product rating is nine out of ten.
Databricks serves as a single platform that can handle numerous end-to-end machine learning tasks. The configuration is simple, scalability is excellent, and monitoring cluster utilization facilitates informed business decisions.
It's easy to schedule jobs, pipelines, and handle multiple use cases in parallel, providing countless benefits.

I usually handle data ingestion and create warehouses. I also assist other teams, such as analytics, to create reports or perform other tasks.
Having one solution for everything, from data engineering to machine learning, is beneficial since everything comes under one hood.
We often use a single cluster to ingest Databricks, which Databricks doesn't recommend. They suggest using a no-cluster solution like job clusters. This can be overwhelming for us because we started smaller.
We prefer using a small to mid-sized cluster for many jobs to keep costs low, but this sometimes doesn't support our operations properly. We need to stay in sync with the DVR versions, and migrations can pose challenges. For example, issues arose when we moved a cluster from a previous version to the latest one. We could use their job clusters, however, that increases costs, which is challenging for us as a startup. Maintaining this infrastructure can be a headache.
I have worked at a couple of companies, not just the current one, and I have about 20 to 25 months of experience with Databricks.
They release patches that sometimes break our code. These patches are supposed to fix issues, but sometimes they cause disruptions.
The patches have sometimes caused issues leading to our jobs being paused for about six hours. Fortunately, nothing important is currently running on Databricks, however, if there were, it would be a significant issue.
They are good. My company has a contract with them that includes good support. Whenever we reach out, they respond promptly.
Neutral
With the benefits we receive, the price is reasonable. However, it's important to have good use cases. If it's just for data ingestion, it might not be the best solution price-wise. For a lot of different tasks, including machine learning, it is a nice solution.
I would rate the solution seven out of ten. That rating also depends on how we have the contract with Databricks.
It's still a solid and good rating. I work as a data engineer and Databricks engineer.

We work on three platforms. Databricks is hosted on Azure for us, so we work with ADFS, Azure Data Factory, and also the AWS Cloud. We work for some customers.
The notebooks and the ability to share them with collaborators are valuable, as multiple developers can use a single cluster. This reduces costs. The scheduling part is managed by Databricks itself, for example, when it is idle, it will automatically turn off. All these features are handled by Databricks, reducing costs. We do not need to schedule separately.
For example, on AWS EC2, we have to create a Lambda function or use System Manager templates to schedule EC2 and EMRs. Here, it is taken care of, saving significant resources.
Additionally, notebooks can be shared within the development team which saves effort. Developers can share their notebooks. Git and Azure DevOps integration on the Databricks side is also very helpful.
The API deployment and model deployment are not easy on the Databricks side. We use MLflow for managing MLOps, however, further improvement would be beneficial, especially for large language models and related tools. Moreover, the API deployment should be simplified for ease of deployment and consumption.
I have been using Databricks for approximately two and a half to three years.
We have not faced any shortages so far. The clusters are available on demand, thus we have not encountered any scalability issues.
We mostly had limited data support required from Databricks. Whenever we did need support, within two or three days the problem was solved. I would rate them ten out of ten.
Positive
We bought it as a service, which is why we never implemented it ourselves. We do not have any implementation team.
For companies focused solely on data transformation, transferring data between databases, and not tackling machine learning or deep learning problems, I recommend ADF. It would be sufficient and cost-saving compared to a full-fledged solution like Databricks. However, for data analytics and solving ETL problems, one should consider Databricks.
I would rate it nine out of ten.
My main use case for Databricks is running SQL queries. I use Databricks in my day-to-day work by doing SQL queries directly in Databricks using the Genius platform to better correct the queries instead of doing queries on another platform.
The best features offered by Databricks include the fact that it is on the web, that it does not depend on installing any software, and most importantly, the security that prevents connection to anyone else who is not logged in.
Regarding the security and web access I mentioned, I have noticed concrete benefits related to collaboration and data protection within my team, such as it being very secure and the fact that every time we enter the platform, it does the same credential verification.
The features of Databricks have impacted my organization positively, as it has done so very efficiently since we switched from several platforms to using this one. After implementing Databricks in my organization, I have observed that it has been more efficient with my team.
I think the aspects of Databricks that should be improved are that it could be faster and that I would like to be able to run direct queries from the server. I have not seen any other improvements that I think are needed in Databricks.
Databricks is stable.
I rate the scalability of Databricks as excellent.
Databricks customer support is very good. I would give Databricks customer support a rating of ten.
I did not use any other solution before Databricks.
I have seen a return on investment, as time is greatly saved and processes are faster.
My experience with pricing, implementation costs, and licensing is that it is very efficient and very fast.
My advice to others considering using Databricks is that it is the best platform with artificial intelligence. I give this review an overall rating of ten.
The use case for Databricks is that we use the clustering for high big data processing within the cluster.
I think it is difficult to determine which feature of Databricks I enjoy the most since there are many valuable features.
What's valuable about Databricks to my organization is that it is more cost-effective and provides better performance than the current AWS tools and services they offer.
I am uncertain about specific improvements for Databricks.
It would be beneficial to make Databricks even more cost-effective.
I have been using Databricks for two years.
My experience with Databricks has been smooth, and I haven't encountered any issues.
Databricks is definitely a very stable product and reliable.
I have not used Databricks customer service or support.
Positive
Before Databricks, I used Batch processing, Fargate, and possibly Kubernetes.
I switched from my previous solutions because they were either too expensive or too difficult to configure.
I have considered other solutions besides Databricks, such as Snowflake, but we haven't explored it extensively yet.
We are still early in our Snowflake experience, so we don't know the pros and cons compared to Databricks.
My deployment model for Databricks is limited as I'm not a heavy user.
I am not the person who purchased Databricks, but it was possibly acquired through the AWS Marketplace.
I may not have utilized Databricks machine learning capabilities.
My experience with the pricing and licensing model is that it remains relatively expensive. Though it's less expensive than AWS, we still need a more cost-effective solution.
I would rate Databricks overall a nine out of ten.
A typical use case for the solution is to build the data lakehouse for the client because they have a variety of source systems, and they want to unify that data into the lakehouse platform, where they want to use the data for analytical purposes and insights.
The most valuable features of Databricks are especially the Delta Lake and the Unity Catalog; those are the main features. The Unity Catalog is for data governance, and the Delta Lake is to build the lakehouse. Currently, they're coming up with workflow jobs, along with other supporting elements to create an end-to-end solution.
In my opinion, areas of Databricks that have room for improvement involve the dashboards. Until recently, everyone used third-party systems such as Power BI to connect to Databricks for dashboards and reports, but they're now coming up with their IBI dashboard, and I think they're on the right track to improve that even further.
I have approximately four years of experience working with Databricks.
I would rate the stability of Databricks as highly stable, around nine out of ten.
I would rate the scalability of this solution as very high, about nine out of ten.
I rate the technical support as fine because they have levels of technical support available, especially partners who get really good support from Databricks on new features. For us, it's so far so good with no problems, and I would rate the support quality as eight out of ten.
Positive
The initial setup of the Databricks solution is reasonably fair enough. It doesn't give any trouble to implement the solution, and I think it's fairly easy to set up and work on Databricks.
I can't say if there's seen an ROI from the solution because I do not have exposure in that area, although I think the people who decided to implement Databricks might have done all this analysis and POCs.
My relationship with the vendor is that I'm not a partner of Databricks; I work for a client where we use the Databricks software for implementing the solutions.
My clients are usually enterprise-level organizations, but the area where they're implementing is medium level here, although it might go into enterprise level in the future.
Regarding the price of Databricks, I don't involve myself in those decisions.
I think Databricks is very good at facilitating AI and machine learning projects; they implement AI and machine learning models very well, and clients can run their models on Databricks. I believe they are in a better place compared to competitors such as Snowflake, and they are tying up with important companies such as SAP and Palantir.
Based on my experience, I would recommend Databricks to other people. Overall, I would rate this solution as one of the best, about eight out of ten, although I might not know some of the pitfalls; it's based on use case to use case, but for us, it's working well.