Try our new research platform with insights from 80,000+ expert users
reviewer2041779 - PeerSpot reviewer
Principal at a computer software company with 5,001-10,000 employees
Real User
Jan 1, 2023
Has advanced modeling and machine-learning features; highly scalable, with no stability issues
Pros and Cons
  • "What I like about Databricks is that it's one of the most popular platforms that give access to folks who are trying not just to do exploratory work on the data but also go ahead and build advanced modeling and machine learning on top of that."
  • "I have had some issues with some of the Spark clusters running on Databricks, where the Spark runtime and clusters go up and down, which is an area for improvement."

What is our primary use case?

I've worked with Databricks primarily in the pharmaceuticals and life sciences space, which means a lot of work on patient-level data and the predictive analytics around that.

Another use case for Databricks is in the manufacturing industry. I'm a consultant, so the use cases for the product vary, but my primary use case for it is in the pharma space.

What is most valuable?

From a data science and applied analytics perspective, what I like about Databricks is that it's probably one of the most popular platforms that give access to folks who are trying not just to do exploratory work on the data but also go ahead and build advanced modeling and machine learning on top of that, and then go ahead and make that available for dissemination of insights. For example, you can save all data and build out endpoints, so business analysts and users can access that data through a dashboard.

During the process, I also like that Databricks allows you to do portion control to keep track of your operations on the data and maintain that lineage to create reproducible results. 

The most significant Databricks advantage is that you can do everything within the platform. You don't need to exit the platform because it's a one-stop shop that can help you do all processes.

The solution is top-notch from a data science, applied ML, or advanced analytics perspective.

What needs improvement?

I have had some issues with some of the Spark clusters running on Databricks, where the Spark runtime and clusters go up and down, which is an area for improvement. Still, I am generally unaware of any super-critical issues.

For how long have I used the solution?

My experience with Databricks is two and a half years.

Buyer's Guide
Databricks
January 2026
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: January 2026.
879,477 professionals have used our research since 2012.

What do I think about the stability of the solution?

Databricks stability is an eight out of ten because I never had issues with its stability.

What do I think about the scalability of the solution?

Databricks has high scalability. Most of my work on the solution has been in the pharma space, which has massive data sets, so it's a nine out of ten, scalability-wise.

How are customer service and support?

I've never dealt with the Databricks technical support team.

How was the initial setup?

I don't have experience setting up Databricks because that's generally taken care of by the IT, data, or software engineering team before the data science team comes in and starts leveraging the platform. I have yet to experience setting up the Databricks environment personally. However, I have had experience setting up clusters, which was pretty straightforward. Still, in the overall environment of an enterprise-wide system, I have yet to gain experience setting Databricks up.

What's my experience with pricing, setup cost, and licensing?

The cost for Databricks depends on the use case. I work on it as a consultant, so I'm using the client's Databricks, so it depends on how big the client is. If it's a global organization, that cost varies versus a smaller organization that has just adopted the platform and is trying to onboard a small team of five people. It depends.

What other advice do I have?

I'm a data scientist, so I frequently use Databricks and Domino Data Science Platform.

I'm a consultant, so every client has a different version or a different runtime in Databricks, so the versions used would vary per client.

The deployment for the solution is on the cloud, predominantly on AWS or Azure.

My clients adopted Databricks as the platform of choice, and with different use cases and more teams coming on board, the usage of Databricks will increase. I don't see that going down. It can only go up.

My advice to anyone looking into implementing Databricks is that it should be one of your top choices, especially if you're looking to focus on data processing, standard ETL operations, advanced analytics, or the ML type of work.

I'd rate the solution as nine out of ten. It checks almost all the boxes that modern applications need to have.

My organization is an active partner and implementer of Databricks, but it doesn't resell the solution.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
PeerSpot user
Nabil Fegaiere1 - PeerSpot reviewer
Chief Executive Officer at a wellness & fitness company with 11-50 employees
Real User
Sep 7, 2023
A powerful solution that is easily integrated into a variety of platforms
Pros and Cons
  • "It's very simple to use Databricks Apache Spark."
  • "I would like more integration with SQL for using data in different workspaces."

What is our primary use case?

I am a Databricks service partner, and my customers use Azure Databricks and Data Factory.

What is most valuable?

It's very simple to use Databricks Apache Spark. It's really good for parallel execution to scale up the workload. In this context, the usage is more about virtual machines.

Using meta-stores like Hive was optional, and the solution is good for data science use cases. With the Authenticator Log, Databricks is good for data transformation and BI usage. We have a platform.

What needs improvement?

I would like more integration with SQL for using data in different workspaces. We use the user interface for some functionalities, while for others, we have to use SQL to create data sets and grant permissions. For example, when creating a cluster, we have to create it with some API or user interface. Creating a cluster with some properties using SQL grants the possibility of using SQL syntax. Integration with SQL will make Databricks easier to use by people who have experience with databases like Lakehouse, and they would be able to use the data lake and BI. More integration will help have one point of view for everyone using SQL syntax.

Integration with Kubernetes could also be good for minimizing the price because you can use Kubernetes instead of virtual machines. But that won't be easy.

For how long have I used the solution?

I have worked with the solution for four or five years, with some experience since 2016.

What do I think about the stability of the solution?

The solution is stable. The only problem with stability would be that people are not using it efficiently.

What do I think about the scalability of the solution?

The solution is good for scalability.

How was the initial setup?

When we have administration experience, the solution is not difficult to deploy. Technically, however, it's difficult because governance is more complex. For example, I have two warehouses on Databricks, which are clusters in this workspace, and we have to switch from workspace to workspace to have all this information. There is a system table that has all this, but I don't know if everyone can use these tables.

What's my experience with pricing, setup cost, and licensing?

Databricks are not costly when compared with other solutions' prices.

Which other solutions did I evaluate?

Databricks's functionalities are as good as solutions like Snowflake, BigQuery, and Redshift.

What other advice do I have?

People sometimes do not use the solution efficiently. They misunderstand databases, the usage of tables, and the performance. Many data engineers are very junior and don't have skills in that. Stability is more a customer problem than a problem with the product itself. One possible problem with the product is that there's no method to pause the usage of something. For example, we have to use the meta server or the data catalog in Synapse. But in Databricks, we have a choice to use a catalog or not, or Hive, which is always integrated, but we have to choose whether to use it or not. Many customers directly use the passes on Databricks, which causes performance and governance problems.

I can offer a lot of advice on Databricks, and one is to use meta stores like Unity Catalog or Hive Metastore. For incoming use cases, it's better to use Unity Catalog.

I rate Databricks a nine out of ten.

Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
PeerSpot user
Buyer's Guide
Databricks
January 2026
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: January 2026.
879,477 professionals have used our research since 2012.
Avadhut Sawant - PeerSpot reviewer
Consulting Architect at a computer software company with 10,001+ employees
Vendor
Sep 1, 2023
Ahead of the competition in building data ecosystems, but needs to improve ease-of-use
Pros and Cons
  • "A very valuable feature is the data processing, and the solution is specifically good at using the Spark ecosystem."
  • "Generative AI is catching up in areas like data governance and enterprise flavor. Hence, these are places where Databricks has to be faster."

What is our primary use case?

I worked with Databricks pretty recently. The particular design processes involved in Databricks were also a part of that specific design/architectural process.

We have used the solution for the overall data foundation ecosystem for processing and storage on a Delta format. We have also seen use cases where we were trying to establish advanced analytics models and data sharing where we leverage the Delta Sharing capabilities from Databricks.

What is most valuable?

A very valuable feature is the data processing, and the solution is specifically good at using the Spark ecosystem.

What needs improvement?

There are some aspects of Databricks, like generative AI, where they are positioning things like DALL-E. They're a little bit late to the game, but I think there are some things that they are working on. Generative AI is catching up in areas like data governance and enterprise flavor. Hence, these are places where Databricks has to be faster, and even though they are fast, I'm not sure how they'll catch up and get adopted because there are strong players in the market.

Databricks is coming up with a few good things in terms of integration. But I have to put one point forward that covers multiple aspects, which is the ease of use for the end user while operating this particular tool. For example, a tool like ADS gives you a GUI-based development, which is good for the end user who does development or maintenance. Looking at the complexities of data integration, a GUI might not be easy, but Databricks should embrace something on the graphical user development front because it is currently notebook-driven. Also, in terms of accessing the data for the end user, Databricks has an SQL interface, similar to earlier tools like SQL Management Studio. Since people are mostly comfortable with SSMS already or not, Databricks can build integration to known tools for data access, and that also helps, apart from what they're doing. I would like to see improvements with respect to user enablement, which is a good part of enterprise strategy. I would like to see their integration with a broader ecosystem of products. If you have to do data governance in tools like Microsoft Purview, it's manual and difficult. Now, I'm unsure if that momentum must be from Databricks or Microsoft. But it would be good if Databricks had some open interfaces to share metadata, which could be viewed in tools enabling data governance like Collibra, Purview, or Informatica. The improvement has to do with user and metadata integration for tools.

For how long have I used the solution?

I've worked with Databricks for over five or six years, but it's been on and off.

What do I think about the scalability of the solution?

The solution is scalable. In this particular ecosystem, there is no one else who can catch up with Databricks for now.

How are customer service and support?

Databricks' customer support is very good. They have a lot of ways in which they interact with vendors and service partners across the globe. They have periodic touch-up sessions with vendors, where their engineers answer your questions.

How was the initial setup?

The implementation is not challenging because the solution integrates well with the platforms on which they are established, whether it's Azure, AWS, or GCP. The solution is not difficult to set up, but you'd probably need a technical user to operate it.

It's the same story with maintenance, where you'd need a technically proficient person with programming knowledge to maintain it.

What other advice do I have?

Databricks integrates many enterprise processes because data processing and AIML are a small part of a larger ecosystem. Databricks has been a part of other platforms, and they are trying to establish their platform, which is a good direction.

Most of the capabilities of the underlying platform can be leveraged there. But the setup isn't difficult if the database lacks some capability, you can't find it in the database, or you're not comfortable with a certain feature in the database. It integrates well with the underlying platform. For example, with scheduling, let's say you are uncomfortable with workflow management. You can utilize integrations with EDA for any other tool and probably perform scheduling. Even if what you're trying to do is not easy, it is enabled with integration. Either they build a required feature in their tool later on, like a GUI, or you perform integrations to make the features possible.

We did evaluate licensing costs, but it had more to do with the Azure ecosystem pricing since whatever we are doing has more to do with Azure Databricks. Many optimizations are recommended, but we haven't exercised those for now. But considering that the processing is a bit more efficient, the overall price won't be much different from what it could be for any other similar component or technology. We haven't had specific discussions with Databricks' folks on pricing.

My advice to users who would like to start working with Databricks is that it is a good solution to work with for data integration and machine learning. Databricks is maturing for other use cases, so there are two points to be considered. One is that you need to evaluate how they will mature, which will be on a case-to-case basis. Second, how will it align with the overall platform story? There will be many overlapping aspects over there as Databricks expands its capabilities. In that case, it must be considered that if those capabilities overlap, how will the underlying platform vendors handle it? How would that interplay happen if many of Databricks' new capabilities align with Microsoft Fabric? That has to be very carefully considered. Otherwise, if you utilize those new capabilities, there might be a discontinuity where you cannot use Databricks because the platform does not support that.

If I specifically talk about Spark-based processing transformations, the data integration story, and advanced stability, I would rate Databricks around eight out of ten. However, with respect to new capabilities like cataloging, data governance, and security integration, I rate Databricks around five because it has to establish these features. And since Databricks integrates with platforms, we must see the interplay with the platforms' capabilities.

I overall rate Databricks a seven out of ten.

Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
PeerSpot user
Rupal Sharma - PeerSpot reviewer
Data Architect at a comms service provider with 1,001-5,000 employees
Real User
Aug 24, 2023
Processes large data for data science and data analytics purposes
Pros and Cons
  • "Specifically for data science and data analytics purposes, it can handle large amounts of data in less time. I can compare it with Teradata. If a job takes five hours with Teradata databases, Databricks can complete it in around three to three and a half hours."
  • "There is room for improvement in visualization."

What is our primary use case?

It's mainly used for data science, data analytics, visualization, and industrial analytics.

What is most valuable?

Specifically for data science and data analytics purposes, it can handle large amounts of data in less time. I can compare it with Teradata. If a job takes five hours with Teradata databases, Databricks can complete it in around three to three and a half hours.

So that's why it's quite convenient to use for data science, for training machine learning models. By using more computing power, you can make it even faster.

What needs improvement?

There is room for improvement in visualization.

For how long have I used the solution?

I used it for two years. I worked with the latest update. 

What do I think about the stability of the solution?

I would rate the stability a nine out of ten. I didn't face performance drops.

What do I think about the scalability of the solution?

I would rate the scalability an eight out of ten.

How are customer service and support?

Databrick's support is great. If we need any support, they are very quick with it. And they genuinely want you to use Databricks. So, whatever we ask them, they come up with multiple solutions to problem statements. That's really good.

Overall, the customer service and support are very good.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I personally prefer using Databricks. However, we also considered using Snowflake, but the pricing was different. It's  price per query.

So, as per your storage, a data scientist or a data analytics team needs to query again and again, which does not suit a data-heavy organization.

What was our ROI?

It's a good return on investment for Databricks from a delivery perspective. Delivered multiple dashboards. So, it's quite a good return on investment. And being a small organization, everyone can use Databricks, and cost-wise, it's also good for small organizations.

Which other solutions did I evaluate?

If the company is a startup, Databricks might be suitable. If a big company needs a lot of storage, Teradata might be best for them. It depends on the situation.

What other advice do I have?

Overall, I would rate the solution a eight out of ten. I would definitely recommend this solution for small organizations. 

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Sahil Taneja - PeerSpot reviewer
Principal Consultant/Manager at a consultancy with 51-200 employees
Real User
May 8, 2023
Processes tremendous data easily
Pros and Cons
  • "The processing capacity is tremendous in the database."
  • "There is room for improvement in the documentation of processes and how it works."

What is our primary use case?

Our primary use case is in our project; we are dealing with Duo Special Data, where we need a lot of computing resources. Here, the traditional warehouse cannot handle the amount of data we are using, and this is where Databricks comes into the picture. 

What is most valuable?

The processing capacity is tremendous in the database. We are dealing with Azure as storage, so we have not faced any challenges. And also the connectors to different data sources. Moreover, it is not a language-dependent tool. Therefore, development also takes place faster. It is one of the best features of Databricks.

What needs improvement?

There is room for improvement in the documentation of processes and how it works. I was trying to get one of the certifications, so I saw an area of improvement there. 

For how long have I used the solution?

I have been using Databricks for eight to nine months.

What do I think about the stability of the solution?

It is a stable product for us. We didn't see any challenges. 

What do I think about the scalability of the solution?

There are around 30 to 35 users in our organization. 

How was the initial setup?

The initial setup was easy because the third-party team made the clusters for us. 

What about the implementation team?

A third-party team enabled the cluster to make the setup easy for us. 

What other advice do I have?

I would advise using it based on the use case because it easily handles big data. It is your go-to tool if you are dealing with massive data. 

Overall, I would rate the solution a nine out of ten. The tool performs well in various use cases, availability of documentation online, and compatibility with big data systems like GCP, Azure, or AWS.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Tajinder_Singh - PeerSpot reviewer
Senior Software Engineer at a computer software company with 201-500 employees
Real User
Leaderboard
Jan 1, 2023
Valuable data analysis and engineering features with an easy setup
Pros and Cons
  • "The setup is quite easy."
  • "Can be improved by including drag-and-drop features."

What is our primary use case?

Our primary use case for the solution is data analysis by providing a Spark cluster environment with a driver to analyze a huge amount of data and gigabytes of data and can create Notebooks in Databricks. We can write SQL commands, Python code, Scala, or Spark with Python. With Databricks, we get a cluster hosted in the public cloud and we adjust it based on how much we use it.

What is most valuable?

The most valuable features are data engineering and data science because we can create Notebooks on them. We can use any Python library to build data science models, or we can use libraries like Seaborn or Matplotlib to create charts based on data for data analysis. It is a really valuable capability.

What needs improvement?

Microsoft Azure has its learning environment on the Microsoft website. We can complete certifications, but the Databricks certification is more expensive than Microsoft. It costs between $2,000 and $2,500, and the knowledge is linked. They're also charged based on whether a person doesn't want to analyze large amounts of data. Hence, we want to have the capacity for free student users so that people can learn and build their professional skills.

For how long have I used the solution?

We have been using the solution for approximately one year.

What do I think about the stability of the solution?

The solution is stable. Microsoft offers a public service, and we can get it from the Databricks website. Additionally, many companies use it to analyze their data or create a Spark cluster to run Python or SQL scripts based on their data. I rate the stability a nine out of ten.

How was the initial setup?

The setup is quite easy, and Databricks has also partnered with Microsoft, so we get this service on Microsoft Azure.

What was our ROI?

We have seen a return on investment.

What's my experience with pricing, setup cost, and licensing?

We have a pay-as-you-go subscription and pay for it based on our usage.

Which other solutions did I evaluate?

We chose this solution because my company uses Microsoft Azure for a project, and my role as a data engineer primarily focuses on data-related services. For storing data, we use Data Lake; similarly, for the data processing engine, we use Spark, which Databricks provides.

What other advice do I have?

I rate the solution an eight out of ten. The solution is good but can be improved by including drag-and-drop features because it can be helpful for users who are unfamiliar with coding. I advise new users to have prior experience with Python or SQL before utilizing this solution if they use it for data science or model building. 

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
AbhishekGupta - PeerSpot reviewer
Engineering Leader at a retailer with 10,001+ employees
Real User
Oct 25, 2022
Fantastic features such as interactive clusters that perform at top speed
Pros and Cons
  • "The solution's features are fantastic and include interactive clusters that perform at top speed when compared to other solutions."
  • "CI/CD needs additional leverage and support."

What is our primary use case?

Our company uses the solution's Spark module for big data analytics as a  processing engine.  

We do not use the module as a streaming engine. The historic perception is that Spark is for batches, machine learning, analytics, and big data processing but not for streaming and that is exactly how we use it. 

What is most valuable?

The solution's features are fantastic and include interactive clusters that perform at top speed when compared to other solutions.

The ATC monitoring experience and the maturity of the APIs are very good. 

What needs improvement?

CI/CD needs additional leverage and support. Community forums are helpful for gaining knowledge but the solution should provide specific documentation.

Streaming services such as Flink should be amplified and better supported. 

There are not many connectors to MapReduce.

For how long have I used the solution?

I have been using the solution for seven years. 

What do I think about the stability of the solution?

The solution is mature and stable compared to other products. 

What do I think about the scalability of the solution?

The solution is scalable with no issues from a computer perspective.

How are customer service and support?

I received support for initial challenges and it was very good. The support team was very professional and provided the answers I needed. 

I rate support an eight out of ten. 

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I previously used Cloud-Bricks. 

How was the initial setup?

The initial setup is easy for me because I access the solution on a web browser. 

What about the implementation team?

Unilever had a specific team for implementing and managing the solution.

Walmart had a team of ten engineers for implementation and a couple of engineers for management. 

What was our ROI?

We receive an ROI for our batch constructs. 

What's my experience with pricing, setup cost, and licensing?

The solution is a good value for batch processing and huge workloads. 

The price might be high for use cases that are for streaming or strictly data science. 

Which other solutions did I evaluate?

I have evaluated multiple options including Cloud-Brick and Dataproc for price versus performance, technical support, and CI/CD approach.

I started as a consumer and used the solution for on-premises deployment with Unilever from a data science perspective. At that time, the solution was in its beta stage but viewed as good, far ahead of its competition, and expensive. The key comparison used to be HDInsight or Adobe Cluster for cloud data and the solution was thought of as a cluster service rather than for unified analytics.

I moved along on my journey to Walmart where I was building their platform and compared it to the solution from a cloud perspective and a cluster service with notebooks. Consumers at the time were using Project Lightspeed and ATC for streaming. Spark was used as a micro-batching engine for machine learning, analytics, and big data processing. At some point, the solution became preferred and more than 100 staff members were leveraging its use.

I found that the solution had interesting features that I liked such as its notebook, interactive clusters with fast speed, and the ATC monitoring experience. I did not like the solution from a CI/CD perspective because it had a rigidity in terms of the approval process.

The solution grew from that original space and, by the time I had moved to Microsoft, was partnered with Microsoft Azure. An integration with ADF and other products solved the CI/CD issues for me.

I am now leading streaming platforms for Walmart so my interest is in the solution's streaming capabilities. I began building a streaming platform using Spark PM in Microsoft so the solution was its key competitor. Then the solution launched a vectorized machine on Photon for the Spark engine. Its performance was a key factor in moving from Microsoft because it performed much better than other products including opensource Spark, Microsoft Synapse Spark, and Dataproc.

What other advice do I have?

It is important to do POCs and run tests to control the meter that also controls the price. The meter can go really high from a computing perspective if POCs and settings are not streamlined. 

I rate the solution an eight out of ten. 

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Sudhendra Umarji - PeerSpot reviewer
Technical Architect at a tech vendor with 10,001+ employees
MSP
Jun 14, 2022
Enables us to find anomalies and apply rules to the streaming data
Pros and Cons
  • "The ability to stream data and the windowing feature are valuable."
  • "Support for Microsoft technology and the compatibility with the .NET framework is somewhat missing."

What is our primary use case?

We use this solution for finding anomalies and applying the rules to the streaming data.

There are around 50 people using this solution in my organization, including data scientists.

What is most valuable?

The ability to stream data and the windowing feature are valuable. There are a number of targeted integration points, so that is a difference between Stream Analytics and Databricks. The integrations input or output are better in Databricks. It's accessible to use any of the Python or even Java. I can use the third party, deploy it, and use it.

What needs improvement?

Support for Microsoft technology and the compatibility with the .NET framework is somewhat missing. There should be reliability between these two. Databricks is based on open sources. If it's more synchronous between the Microsoft technology and the programming languages, it'll be better. Python has better languages, but compatibility would be a great help.

I would like to have better support for Microsoft technology and better language components.

With Azure or Cosmo DB, I can store other data links or time series data tables. That would be a great help for analytics in real time.

For how long have I used the solution?

I have been using Databricks for eight months.

What do I think about the scalability of the solution?

The scalability is fine. We had thousands of devices and were sending data infrequently, so that worked for us. If the amount increases, the windowing function and job schedule may not perform as expected.

How are customer service and support?

I would rate technical support 4 out of 5. We had some issues with setup, and they were finally solved but it was after following up a few times.

Which solution did I use previously and why did I switch?

Azure Stream Analytics is easy to use and easy to deploy. It's a little bit better. Databricks is still having some stability issues. Azure Stream Analytics has a few input and output sources, and it's scalable to all types of third party or interfaces.

How was the initial setup?

Setup was complex. There were some issues with setting up a database and installing the third party component on top of services. I would rate the setup 3 out of 5.

What about the implementation team?

Implementation was done in-house.

What's my experience with pricing, setup cost, and licensing?

The cost is around $600,000 for 50 users.

I would rate the price 2 out of 5.

What other advice do I have?

I would rate this solution 8 out of 10.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.
Updated: January 2026
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.