Head of Business Integration and Architecture at Jakala
Real User
Top 5
Highly scalable data platform that offers exceptional performance and value data types unique to this solution
Pros and Cons
  • "The Delta Lake data type has been the most useful part of this solution. Delta Lake is an opensource data type and it was implemented and invented by Databricks."
  • "The data visualization for this solution could be improved. They have started to roll out a data visualization tool inside Databricks but it is in the early stages. It's not comparable to a solution like Power BI, Luca, or Tableau."

What is our primary use case?

We use this solution for the Customer Data Platform(CDP). My company works in the MarTech space and usually we implement custom CDP.

What is most valuable?

The Delta Lake data type has been the most useful part of this solution. Delta Lake is an opensource data type and it was implemented and invented by Databricks. It is the most important element of the solution. Databricks also offers exceptional performance and scalability. 

What needs improvement?

The data visualization for this solution could be improved. They have started to roll out a data visualization tool inside Databricks but it is in the early stages. It's not comparable to a solution like Power BI, Luca, or Tableau.

In a future release, we would like to have a better ETL designer tool to assist in the way we move data from one place to another.

For how long have I used the solution?

We have been using this solution for four years. 

Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.

What do I think about the stability of the solution?

This is a stable solution. 

What do I think about the scalability of the solution?

This is a scalable solution. 

How was the initial setup?

The initial setup is very easy. It is a managed solution inside Azure so you just need to search for Databricks. There are a couple of pages to follow in the setup wizard and Databricks is up and running.

What's my experience with pricing, setup cost, and licensing?

We implement this solution on behalf of our customers who have their own Azure subscription and they pay for Databricks themselves. The pricing is more expensive if you have large volumes of data. 

Which other solutions did I evaluate?

When we first started using Databricks in 2018, there were not many comarable solutions to consider. Right now there are many solutions to consider including Snowflake, Azure Synapse, Redshift and BigQuery.

Databricks continues to be our solution of choice but Snowflake does have a better user interface and is easier to work with the data pipelines and with the overall UI.

What other advice do I have?

I would advise others to first define a strong data strategy and then choose which data platform suits your needs. 

I would rate this solution a nine out of ten. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Head of Referential and Big Data at a financial services firm with 5,001-10,000 employees
Real User
A highly scalable unified data platform that provides data access to any type of user
Pros and Cons
  • "I like cloud scalability and data access for any type of user."
  • "It would be better if it were faster. It can be slow, and it can be super fast for big data. But for small data, sometimes there is a sub-second response, which can be considered slow. In the next release, I would like to have automatic creation of APIs because they don't have it at the moment, and I spend a lot of time building them."

What is our primary use case?

We use Databricks to define tool data and have many use cases to analyze and distribute the data.

How has it helped my organization?

Data is open to everyone; they can access it through many channels, including notebooks or SQL. That on its own democratizes the data.

What is most valuable?

I like cloud scalability and data access for any type of user.

What needs improvement?

It would be better if it were faster. It can be slow, and it can be super fast for big data. But for small data, sometimes there is a sub-second response, which can be considered slow.

In the next release, I would like to have automatic creation of APIs because they don't have it at the moment, and I spend a lot of time building them.

For how long have I used the solution?

I have been using Databricks for roughly one and a half years.

What do I think about the stability of the solution?

Stability is excellent.

What do I think about the scalability of the solution?

Databricks is scalable. You can use the power of the cloud to scale your cluster size, either CPU or memory. The data doesn't work like a standard database, so you don't have it based on files, and you don't copy the data. It's super scalable. It's only the computing that you have to scale with the data.

We probably have 40 users with roles like developers, business analysts, and data scientists. We have big plans to increase the usage and have more departments using it.

How are customer service and support?

Technical support has helped us.

On a scale from one to ten, I would give technical support a five.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used Cloudera before switching to Databricks.

How was the initial setup?

The initial setup was fairly okay. It takes about two minutes to deploy this solution. It's all code, so we click a button, and then it's done.

On a scale from one to five, I would give the initial setup a four.

What about the implementation team?

We set up and deployed this solution.

What was our ROI?

On a scale from one to five, I would give our ROI a three.

What's my experience with pricing, setup cost, and licensing?

We only pay for the Azure compute behind the solution. If you want to compute, you have to have a database layer and Azure below.

On a scale from one to five, I would give their pricing a two.

Which other solutions did I evaluate?

We looked at other options such as Snowflake and Cloudera on the cloud,

What other advice do I have?

I would tell potential users that they need proper cloud engineers and a 
cloud infrastructure team to use this solution.

On a scale from one to ten, I would give Databricks a nine.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.
Head of Credit Risk and Data at Cegid Invoice and Financing
Vendor
It's a reasonably priced all-in-one platform that enables us to build a lakehouse framework
Pros and Cons
  • "Databricks gives us the ability to build a lakehouse framework and do everything implicit to this type of database structure. We also like the ability to stream events. Databricks covers a broad spectrum, from reporting and machine learning to streaming events. It's important for us to have all these features in one platform."
  • "I'm not the guy that I'm working with Databricks on a daily basis. I'm on the management team. However, my team tells me there are limitations with streaming events. The connectors work with a small set of platforms. For example, we can work with Kafka, but if we want to move to an event-driven solution from AWS, we cannot do it. We cannot connect to all the streaming analytics platforms, so we are limited in choosing the best one."

What is our primary use case?

We primarily use Databricks for reporting and machine learning.

What is most valuable?

Databricks gives us the ability to build a lakehouse framework and do everything implicit to this type of database structure. We also like the ability to stream events. Databricks covers a broad spectrum, from reporting and machine learning to streaming events. It's important for us to have all these features in one platform.

What needs improvement?

I'm not the guy that I'm working with Databricks on a daily basis. I'm on the management team. However, my team tells me there are limitations with streaming events. The connectors work with a small set of platforms. For example, we can work with Kafka, but if we want to move to an event-driven solution from AWS, we cannot do it. We cannot connect to all the streaming analytics platforms, so we are limited in choosing the best one.

Also, this is an all-in-one platform, but it might be preferable if there were an a la carte model where we could select the best tool in each class for reporting, machine learning, etc. I'm not yet sure if this strategy is the best one. 

For how long have I used the solution?

We've been using Databricks since the start of the year.

What do I think about the stability of the solution?

Databricks is quite stable. We haven't had any issues with stability. It's always working perfectly with no downtime.

What do I think about the scalability of the solution?

Databricks is based on Spark, which is based on Scala. These languages aren't easy to handle, and it's challenging to find people who know them well. At the same time, a couple of other vendors that work on top of Databricks are low-code platforms. We have to work around Databrick's lack of scalability by using low-code platforms that work on top of Databricks to give us scalability.

How are customer service and support?

I'll give Databricks support 10 out of 10. They are always prompt even though we didn't buy a support package. They have done an excellent job.

How would you rate customer service and support?

Positive

How was the initial setup?

Setting up Databricks is a bit complex, and the initial deployment took a few days—closer to a week. Of course, not everyone is working full-time on this. There are intervals when people are doing other stuff. 

What was our ROI?

It's too soon to tell what kind of return we're getting because we just started using it, and we're still migrating.

What's my experience with pricing, setup cost, and licensing?

The cost of Databricks is in the lower range compared to other solutions. That was one of the main reasons we chose Databricks over other vendors and platforms.  

We pay as we go, so there isn't a fixed price. It's charged by the unit. I don't have any details detail about how they measure this, but it should be a mix between processing and quantity of data handled. We run a simulation based on our use cases, which gives us an estimate. We've been monitoring this, and the costs have met our expectations. 

What other advice do I have?

I give Databricks nine out of 10. The solution has met all our expectations. I'd recommend it to a friend. It's a reasonably priced all-in-one solution that gives us data lake and lakehouse capabilities. Those were the primary reasons we chose Databricks.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Anirban Bhattacharya - PeerSpot reviewer
Practice Head, Data & Analytics at a tech vendor with 10,001+ employees
Real User
Top 5
Key feature is ability to make changes in structure or data size and align for subsequent consumption
Pros and Cons
  • "Can cut across the entire ecosystem of open source technology to give an extra level of getting the transformatory process of the data."
  • "Implementation of Databricks is still very code heavy."

What is our primary use case?

We have a team that works on Databricks for our clients. We are customers of Databricks. 

What is most valuable?

Databricks can cut across the entire ecosystem of open source technology which gives an extra level in terms of getting the transformatory process of the data. The solution is primarily open source and they have bolstered its components to make it more fit for purpose for a complete Azure Data platform. The solution is responsible for the core transformatory activities. While Azure Data Factory is very good for pulling in the data, doing the basic standardization and profiling, Databricks is more about making fundamental changes in structure or in size of the data and aligning it for subsequent consumption, or for the final layer on Synapse. It also has the power to complement and work with Spark and elements related to Python. 

What needs improvement?

In my view, the fundamental approach of implementing Databricks is still very code heavy, more than you find in Azure Data Factory and other technologies like Informatica or SQL Server Integration Service. From my perspective, that could be improved. I'd also like to have the ability to facilitate predictive analytics within the solution. 

For how long have I used the solution?

I've been using the solution for a year and a half. 

What do I think about the stability of the solution?

Stability of the product is good, whether it's handling large volumes, diverse elements of data or processing data at speed. It has stood the test of time. It's a solution that really lends itself to that higher level of stability, versatility and diversity in terms of its capability to process different forms of data.

What's my experience with pricing, setup cost, and licensing?

The cost of the solution is slightly on the high side so it's important to use it efficiently.

What other advice do I have?

Use the solution wisely and in tandem with Azure Data Factory. Apply the prism in your overall design of the pipelines of the flow, to utilize to its potential. Databricks offers significant capability to the transformatory and data tranching capabilities in terms of diverse variety to Azure Data Stack per se. In terms of the license, ensure that the customer is getting what they paid for so that the value for money is realized. 

I rate the solution eight out of 10. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Data Engineering Manager at a pharma/biotech company with 10,001+ employees
Real User
A great and easy-to-use platform for data engineers and data scientists who rely on a large dataset to do advanced analytics reporting
Pros and Cons
  • "The most valuable feature is the Spark cluster which is very fast for heavy loads, big data processing and Pi Spark."
  • "It would be great if Databricks could integrate all the cloud platforms."

What is our primary use case?

We use Databricks for data science work in projects that create data pipelines, pre-processing, data wrangling, big data cluster management and ML, machine learning and deep learning tasks.

How has it helped my organization?

Databricks collaborates very well with the Azure platform, Dataiku, and enterprise AI tool. Databricks is a new connection to pull the data or connect to the Spark cluster. It is helpful for us to instance it or distribute the load through the Spark cluster, and it is very user-friendly.

What is most valuable?

The most valuable feature is the Spark cluster which is very fast for heavy loads, big data processing and Pi Spark.

What needs improvement?

Databricks as a solution is integrated with Azure, but Google Cloud has some restrictions. I'm not sure about AWS Cloud, but it would be great if Databricks could integrate all the cloud platforms. Regarding additional features, we would like to see them mostly on the data engineering side, where we have a Spark cluster and some inbuilt ML. In addition, pre-processing steps will be useful.

For how long have I used the solution?

We have been using this solution for two years and are using the latest update.

What do I think about the stability of the solution?

It is a stable solution as long as the Microsoft Azure Platform is stable too.

What do I think about the scalability of the solution?

It is a scalable solution, both vertically and horizontally, which is good. My organization is big, and we have a lot of users. In my department, we have about 15 people using Databricks.

How are customer service and support?

We have not escalated any issues to technical support, but we initially struggled with configuration and the settings of Hive metastore, but we resolved it. I rate the technical support a nine out of ten.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We were using the looped EMR elastic MapReduce from AWS before using Databricks. We switched to Databricks because the whole platform changed from AWS to Azure platform, and Databricks comes as a package.

How was the initial setup?

The initial setup was easy to complete and not complex. It may initially be challenging for a new user, but it improves over time. The CICD pipeline works well with the Microsoft Azure platform because the continuous integration, development and deployment come with the Git integration. It makes it easier for Databricks and the CICD. The deployment should be improved from the perspective of auto ML functionality, so it doesn't have intensive automation learning capability.

We don't use Databricks directly because we work on a data science project. It requires an auto ML and inbuilt machine learning capability. We found capabilities like the large language model using NLP and other deep learning models that are not that intensive. It is meant for data engineering purposes rather than data science purposes. It'll be great if Databricks could be intensive for data science.

We used a third-party, Dataiku platform for the deployment, where we connected to Databricks and completed the ML ops. We required about three people for deployment, and it is easy to maintain the solution.

What was our ROI?

We have seen an ROI but cannot differentiate because it also comes with the Azure platform.

What's my experience with pricing, setup cost, and licensing?

I do not have details about the pricing.

What other advice do I have?

I rate this solution a nine out of ten. Regarding advice, Databricks is a very good platform, popular and easy to use daily for data engineers and data scientists who rely on a large dataset to do advanced analytics reporting. It's a very good tool.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Machine Learning Engineer at a mining and metals company with 10,001+ employees
Real User
Highly scalable, stable and good technical support
Pros and Cons
  • "Databricks is a scalable solution. It is the largest advantage of the solution."
  • "The interface of Databricks could be easier to use when compared to other solutions. It is not easy for non-data scientists. The user interface is important before we had to write code manually and as solutions move to "No code AI" it is critical that the interface is very good."

What is our primary use case?

We were using Databricks to build an AI solution. We are only evaluating it, we have approximately three people that tried it out. Later we choose another solution, we did not fully deploy Databricks.

How has it helped my organization?

Before I used Databricks it took me a long time to do some functions and now with Databricks I can do them much quicker. It scales very well.

What needs improvement?

The interface of Databricks could be easier to use when compared to other solutions. It is not easy for non-data scientists. The user interface is important before we had to write code manually and as solutions move to "No code AI" it is critical that the interface is very good.

For how long have I used the solution?

I have used Databricks within the last 12 months.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

Databricks is a scalable solution. It is the largest advantage of the solution.

How are customer service and support?

We have been in contact with the technical support of Databricks, they were good.

Which solution did I use previously and why did I switch?

We have used a lot of different solutions, such as Watson and DataIQ.

How was the initial setup?

The initial setup is easy. However, I do not know much about the implementation because the company does it.

What about the implementation team?

We did the implementation of the solution.

What other advice do I have?

If companies want scalability, they should choose Databricks.

I rate Databricks a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Tajinder_Singh - PeerSpot reviewer
Senior Software Engineer at a computer software company with 201-500 employees
Real User
Top 5Leaderboard
Valuable data analysis and engineering features with an easy setup
Pros and Cons
  • "The setup is quite easy."
  • "Can be improved by including drag-and-drop features."

What is our primary use case?

Our primary use case for the solution is data analysis by providing a Spark cluster environment with a driver to analyze a huge amount of data and gigabytes of data and can create Notebooks in Databricks. We can write SQL commands, Python code, Scala, or Spark with Python. With Databricks, we get a cluster hosted in the public cloud and we adjust it based on how much we use it.

What is most valuable?

The most valuable features are data engineering and data science because we can create Notebooks on them. We can use any Python library to build data science models, or we can use libraries like Seaborn or Matplotlib to create charts based on data for data analysis. It is a really valuable capability.

What needs improvement?

Microsoft Azure has its learning environment on the Microsoft website. We can complete certifications, but the Databricks certification is more expensive than Microsoft. It costs between $2,000 and $2,500, and the knowledge is linked. They're also charged based on whether a person doesn't want to analyze large amounts of data. Hence, we want to have the capacity for free student users so that people can learn and build their professional skills.

For how long have I used the solution?

We have been using the solution for approximately one year.

What do I think about the stability of the solution?

The solution is stable. Microsoft offers a public service, and we can get it from the Databricks website. Additionally, many companies use it to analyze their data or create a Spark cluster to run Python or SQL scripts based on their data. I rate the stability a nine out of ten.

How was the initial setup?

The setup is quite easy, and Databricks has also partnered with Microsoft, so we get this service on Microsoft Azure.

What was our ROI?

We have seen a return on investment.

What's my experience with pricing, setup cost, and licensing?

We have a pay-as-you-go subscription and pay for it based on our usage.

Which other solutions did I evaluate?

We chose this solution because my company uses Microsoft Azure for a project, and my role as a data engineer primarily focuses on data-related services. For storing data, we use Data Lake; similarly, for the data processing engine, we use Spark, which Databricks provides.

What other advice do I have?

I rate the solution an eight out of ten. The solution is good but can be improved by including drag-and-drop features because it can be helpful for users who are unfamiliar with coding. I advise new users to have prior experience with Python or SQL before utilizing this solution if they use it for data science or model building. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
MahalaxmanraoChappedi - PeerSpot reviewer
Associate Principal - Data Engineering at a tech services company with 10,001+ employees
Real User
Top 20
It's a unified platform that lets you do streaming and batch processing in the same place
Pros and Cons
  • "I like that Databricks is a unified platform that lets you do streaming and batch processing in the same place. You can do analytics, too. They have added something called Databricks SQL Analytics, allowing users to connect to the data lake to perform analytics. Databricks also will enable you to share your data securely. It integrates with your reporting system as well."
  • "Databricks may not be as easy to use as other tools, but if you simplify a tool too much, it won't have the flexibility to go in-depth. Databricks is completely in the programmer's hands. I prefer flexibility rather than simplicity."

What is our primary use case?

We build data solutions for the banking industry. Previously, we worked with AWS, but now we are on Azure. My role is to assess the current legacy applications and provide cloud alternatives based on the customers' requirements and expectations.

Databricks is a unified platform that provides features like streaming and batch processing. All the data scientists, analysts, and engineers can collaborate on a single platform. It has all the features, you need, so you don't need to go for any other tool. 

What is most valuable?

I like that Databricks is a unified platform that lets you do streaming and batch processing in the same place. You can do analytics, too. They have added something called Databricks SQL Analytics, allowing users to connect to the data lake to perform analytics. Databricks also will enable you to share your data securely. It integrates with your reporting system as well.

The Unity Catalog provides you with the data links and material capabilities. These are some of the unique features that fulfill all the requirements of the banking domain.

What needs improvement?

Every tool has room for improvement. Normally what happens, a solution will claim it can do ETL and everything else, but you encounter some limitations when you actually start. Then you keep on interacting with the vendor, and they continue to upgrade it. For example, we haven't fully implemented Databricks Unity Catalog, a newly introduced feature. We need to check how it works and then accordingly, there can be improvements in that also.

Databricks may not be as easy to use as other tools, but if you simplify a tool too much, it won't have the flexibility to go in-depth. Databricks is completely in the programmer's hands. I prefer flexibility rather than simplicity.

For how long have I used the solution?

I have been using Databricks for a year.

What do I think about the scalability of the solution?

Databricks relies on scalability and performance. Every cloud vendor prioritizes scalability, high availability, performance, and security. These are the most important reasons to move to the cloud.

How was the initial setup?

Deploying Databricks on the cloud is straightforward. It's not like an on-premise solution, where you must create a cluster and all those other prerequisites for big data. 

I don't think it's challenging to maintain, but you need an expert programmer because Databricks isn't GUI-based. With GUI-based tools, building ETLs is drag-and-drop. Databricks entirely relies on coding, so you need skilled programmers to building your code, ETLs, etc. 

What's my experience with pricing, setup cost, and licensing?

The price of Databricks is based on the computing volume. You also need to pay storage costs for the cloud where you're hosting Databricks, whether it is AWS, Azure, or Google. 

What other advice do I have?

I rate Databricks nine out of 10. Databricks is one of the best tools on the market.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer: Implementer
PeerSpot user
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2024
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.