Lead Analytics at a manufacturing company with 10,001+ employees
Real User
Useful machine learning and easy to scale
Pros and Cons
  • "In the manufacturing industry, Databricks can be beneficial to use because of machine learning. It is useful for tasks, such as product analysis or predictive maintenance."
  • "The stability of the clusters or the instances of Databricks would be better if it was a much more stable environment. We've had issues with crashes."

What is our primary use case?

Our team is currently utilizing machine learning for various applications, and a few members are also exploring Databrick's use for ML operations.

What is most valuable?

In the manufacturing industry, Databricks can be beneficial to use because of machine learning. It is useful for tasks, such as product analysis or predictive maintenance.

For how long have I used the solution?

I have been using Databricks for approximately six months

What do I think about the stability of the solution?

The stability of the clusters or the instances of Databricks would be better if it was a much more stable environment. We've had issues with crashes.

Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.

What do I think about the scalability of the solution?

The scalability of Databricks is good as long as you have a data lake, and it's easy to scale.

We have approximately 50 users using this solution in my company.

How are customer service and support?

We have a different team who handles the support. I do not have contact with Databricks support.

Which solution did I use previously and why did I switch?

I have not used a similar solution to Databricks.

What was our ROI?

I have seen an ROI using Databricks.

What's my experience with pricing, setup cost, and licensing?

I rate the price of Databricks as eight out of ten.

What other advice do I have?

Having a good understanding of physical security in relation to cybersecurity in an OT (Operational Technology) environment would be beneficial, and utilizing an existing data lake prior to implementing a Databricks initiative would greatly aid in its success.

I rate Databricks an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Lead Architect at Birlasoft IndiaLtd.
Real User
Data analytics platform that supports large volumes of data and related activities
Pros and Cons
  • "This solution offers a lake house data concept that we have found exciting. We are able to have a large amount of data in a data lake and can manage all relational activities."
  • "The connectivity with various BI tools could be improved, specifically the performance and real time integration."

What is most valuable?

This solution offers a lake house data concept that we have found exciting. We are able to have a large amount of data in a data lake and can manage all relational activities. All asset complaints properties are available and this is very useful to ensure the quality of all data.

What needs improvement?

The connectivity with various BI tools could be improved, specifically the performance and real time integration. There is also some improvement required in the semantic layers to manage the data match as well as the data warehouse features.

In a future release, we would like to have features to better manage all ML development activities.

For how long have I used the solution?

I have been using this solution for three years. 

What do I think about the stability of the solution?

This is a stable solution, especially compared to other technology on the market.

What do I think about the scalability of the solution?

It is a scalable solution but this depends on the platform that is being used. If you use a cloud platform such as Azure, it offers scalability. However, some platforms will not support scalability using Databricks.

We have around 20 users in our development team using Databricks. 

How are customer service and support?

The customer service and support for this solution is good. 

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup is pretty simple and requires minimal configuration compared to other technology.

What's my experience with pricing, setup cost, and licensing?

I would rate the pricing for this solution a four out of five. This does depend on the environment or the infrastructure that one is using. There is a difference in pricing between using Azure or being on-premises. 

Which other solutions did I evaluate?

Azure Synapse is a competitor that we evaluated but it is not mature enough to provide better performance than Databricks. We choose Databricks due to the ability to have a lot of data in Data Lakes and the Data Warehouse. We are also able to run data science activities using ML flow.

What other advice do I have?

If you are looking for custom model development and a lot of data management in a cloud agnostic manner, then Databricks is a good solution.

I would rate this solution an eight out of ten. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.
Jorge Alvarado - PeerSpot reviewer
Owner at a marketing services firm with 1-10 employees
Real User
The data governance has been absolutely efficient in between other kinds of solutions
Pros and Cons
  • "Databricks' Lakehouse architecture has been most useful for us. The data governance has been absolutely efficient in between other kinds of solutions."
  • "I would like it if Databricks made it easier to set up a project."

What is our primary use case?

We use Databricks for video streaming and security purposes.

What is most valuable?

Databricks' Lakehouse architecture has been most useful for us. The data governance has been absolutely efficient in between other kinds of solutions.

What needs improvement?

I would like it if Databricks made it easier to set up a project. The use case determines which services we are going to use. You have the application engine, and you generate a potential budget for your workloads, so you can understand what you are going to do, what you are going to use, and what you will invest in.

Because I'm deploying on the Google Cloud Platform, measuring the investment, value, and use case is extremely difficult. So I leave it and move on without the risk. It would be easier if I had one page where you can see three columns: one for the use cases of a specific architecture, a second one for the prices based on the volume of data or machine time, and the third column for the budget. That would make it easier to know if I am using the appropriate architecture for the right solution.

I have seen something like that in Microsoft Azure, but obviously Microsoft Azure costs a lot of money. Amazon has something like that, but it's very complicated to use.

For how long have I used the solution?

We've been using Databricks for about five years.

What do I think about the stability of the solution?

Databricks is very stable and powerful.

What do I think about the scalability of the solution?

It was simple to make Databricks scalable. We found that we could set up an alert to tell us if we needed more resources, money, or time from our team. We're alerted when the system detects some trigger for any use of the instance. If you have another alert from your side, that would be extremely useful because it takes a lot of time to develop that kind of trigger. 

How are customer service and support?

Databricks technical support was lovely. We don't need it so much, but the few questions we had were answered immediately.

How was the initial setup?

I am not a data engineer because I just started data science at the company, but it was straightforward and clear for the architect to set up. He provided me with that idea because he realized it would take time if we had use cases. You can select and change the data or add some modules or products. You have all the technology to do so.

What other advice do I have?

I rate Databricks eight out of 10. I like to move my customers into Databricks, but I take care of the internal system infrastructure so they can continue to use familiar software or operating systems and databases. They have a lot of doubts because they don't know the solution. We need to train them, explain things, and show the solution's potential value. 

Generally, companies try to keep the same flavor when they migrate. For example, if they are using many Microsoft products, they want to work with Azure. If they are open to other options, they go with GCP or AWS. However, Databricks doesn't have enough customers here in my market because it's not a visible brand. Azure, GCP, and AWS are highly visible here, so the local teams are friendly with the three brands.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Coordenador Financeiro at Icatu
Real User
Good technical support, but is difficult to set up and integrate
Pros and Cons
  • "The technical support is good."
  • "The initial setup is difficult."

What is our primary use case?

I believe we are using the new version.

Our company makes comprehensive use of the solution to consolidate data and do a certain amount of reporting and analytics. All the data consumers use Databricks to develop the information.

What needs improvement?

Data governance should be addressed. We have some trouble connecting all the governance solutions with Databricks. This means the integrative capabilities are problematic. 

The initial setup is difficult. 

For how long have I used the solution?

We have been using Databricks for a year-and-a-half.

What do I think about the stability of the solution?

The solution is stable. 

What do I think about the scalability of the solution?

The solution is scalable. 

How are customer service and support?

The technical support is good. 

Which solution did I use previously and why did I switch?

As we are talking about a corporate solution, the deployment of Databricks lasted longer than the one day it took for Alteryx. 

We used Alteryx prior to Databricks and continue to do so, it being the only other solution we have employed. We use the two with different software. 

How was the initial setup?

The initial setup is difficult. 

While I don't know exactly how long the deployment took, I do know that it lasted longer than the one day needed for Alteryx. 

What about the implementation team?

I believe we used a partner for the deployment, although I cannot say for certain, as this is not within my purview. 

I don't know how many people are needed for maintenance and deployment. 

What's my experience with pricing, setup cost, and licensing?

As the licensing is not within my purview, I am not in a position to comment on this. 

What other advice do I have?

My company makes use of the solution. It is employed by my data team and the technology one. I do not have personal experience using the solution. 

The solution is deployed on base, on data. 

I am not aware of how many people make use of it. 

I rate Databricks as a seven out of ten. 

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Tristan Bergh - PeerSpot reviewer
Data Scientist at a computer software company with 501-1,000 employees
Real User
Top 10
Good built-in optimization, easy to use with a great user interface
Pros and Cons
  • "The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly."
  • "The product could be improved by offering an expansion of their visualization capabilities, which currently assists in development in their notebook environment."

What is our primary use case?

We are using this solution to run large analytics queries and prepare datasets for SparkML and ML using PySpark.

We ran on multiple clusters set up for a minimum of three and a maximum of nine nodes having 16GB RAM each.

For one ad hoc requirement, a 32-node cluster was required.

Databricks clusters were set for autoscaling and to time out after forty minutes of inactivity. Multiple users attached their notebooks to a cluster. When some workloads required different libraries, a dedicated cluster was spun up for that user.

How has it helped my organization?

Databricks took care of all the underlying cluster management seamlessly. We could configure our clusters to run and deliver results without any delays due to hardware configuration or installation issues.

Databricks allowed us to go from non-existent insights (because the datasets were just too large) to immediate and rich insights once the datasets were ingested into our PySpark notebooks.

What is most valuable?

Immense ease in running very large scale analytics, with a convenient and slick UI. This saved us from having to tweak, tune, dive into deeper abstractions, get involved in procurement, and also having to wait for other workloads to run.

The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly. 

The Delta data format proved excellent. Databricks had already done the heavy lifting and optimized the format for large scale interactive querying. They saved us a lot of time.

What needs improvement?

The product could be improved by offering an expansion of their visualization capabilities, which currently assists in development in their notebook environment. Perhaps a few connectors that auto-deploy to a reporting server?

More parallelized Machine Learning libraries would be excellent for predictive analytics algorithms.

For how long have I used the solution?

I have been using this solution for three years.

What do I think about the stability of the solution?

This solution is stable and proved very robust. When very obvious programmatic recommendations were not followed, causing memory overruns on a driver, the clusters required restarting.

What do I think about the scalability of the solution?

Absolutely, seamlessly, and massively scalable, within only budgetary limits. Also, the product itself offers real-time efficiency and optimization recommendations. 

How are customer service and support?

So brilliant, it was never required. Their documentation is comprehensive, clear, simple, and thorough. 

Which solution did I use previously and why did I switch?

Previously I used Hive and Livy in Zeppelin on an in-house Hadoop installation. The queries constantly threw exceptions and timeouts and the necessary configuration changes proved time-consuming and problematic. Databricks, on the other hand, simply made all those problems vanish. 

How was the initial setup?

Setup and Support are single-click.

What about the implementation team?

We used an in-house team for implementation.

What was our ROI?

Our ROI was of the order of USD $75k per year for one deployment. We were able to switch our workloads from an onsite Hadoop cluster, billed to our department for more than USD $100k per year, to a Databricks workspace in the cloud for a quarter of that expenditure. 

Further, we were able to transparently and efficiently scale our queries to run under fifteen minutes per major analytics use case, while being subject to unstable queries and highly brittle data flow use cases from the in-house Hadoop cluster.

We are further reducing spending on our traditional RDBMS solution by offloading reporting workloads to the Databricks PySpark notebooks, which is reducing our expensive datacenter resources and freeing up RDBMS resources for OLTP loads. 

What's my experience with pricing, setup cost, and licensing?

Set up a cluster in your cloud of choice, but Databricks' service might also be very competitive as their pricing units will be built in. 

Licensing on site I would counsel against, as on-site hardware issues tend to really delay and slow down delivery.

Which other solutions did I evaluate?

I evaluated Hortonworks, Livy, and Zeppelin. These were unsuitable due to the unavailability of sufficiently skilled personnel.

What other advice do I have?

By investing in people skilled in data querying, Python coding, and even basic Data Science, a Databricks setup will reward the business. 

Once the Databricks data flows are established, it is a matter of a few incremental steps to opening up streaming and running up-to-the-minute queries, allowing the business to build its data-driven processes. 

Databricks continues to advance the state-of-the-art and will be my go-to choice for mission-critical PySpark and ML workflows. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Joaquin Marques - PeerSpot reviewer
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC
Real User
Top 5Leaderboard
Saves time and effort; thousands of applicable use cases
Pros and Cons
  • "Databricks has improved my organization by allowing us to transform data from sources to a different format and feed that to the analytics, business intelligence, and reporting teams. This tool makes it easy to do those kinds of things."
  • "In the next release, I would like to see more optimization features."

What is our primary use case?

Databricks is very useful and can handle thousands of different use cases. The use cases are all over the place.

How has it helped my organization?

Databricks has improved my organization by allowing us to transform data from sources to a different format and feed that to the analytics, business intelligence, and reporting teams. This tool makes it easy to do those kinds of things.

What is most valuable?

The most valuable Databricks feature for us is that it does not require us to configure clusters. It automatically configures the clusters to the right size, the right number of clusters, the right number of nodes per cluster, et cetera.

What needs improvement?

The area in which this product can be improved is optimization. In the next release, I would like to see more optimization features.

For how long have I used the solution?

I have been using Databricks for a couple of years.

What was our ROI?

I would say the ROI for this solution is expressed mainly in terms of effort and time.

What's my experience with pricing, setup cost, and licensing?

I would advise that they train themselves before using Databricks. They should figure out which advantages Databricks has over just plain Spark and use it to the best advantage that they can.

What other advice do I have?

I am currently implementing the latest version of Databricks.

The Databricks solution is deployed through Cloud.

I would rate the Databricks solution a nine.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Sarbani Maiti - PeerSpot reviewer
Vice President at a tech services company with 51-200 employees
Real User
Top 20
Very easy to use and requires minimal coding and customizations
Pros and Cons
  • "Easy to use and requires minimal coding and customizations."
  • "Doesn't provide a lot of credits or trial options."

What is our primary use case?

Our primary use case of this product is for our customers who are running large systems and looking for an API -- a quick, easy integration with their own system. We use Databricks to create a secure API interface. I'm vice president of data science and we are customers of Databricks. 

What is most valuable?

Databricks is quite easy to use and requires less coding and customizations than a solution like AWS SageMaker which I'd previously used on a lot of projects. Databricks enables more people to efficiently build and host their ML code. Another great aspect is that MLflow is already integrated with Databricks which makes a big difference. It enables us to track and monitor all our different experiments. We have mostly used the MLflow part and generic notebooks with the ML building machine learning model, as well as using Pytorch for some of our medical imaging. We were able to quickly deploy both these features without requiring anything extra. 

What needs improvement?

I'm struggling a little because I wanted to do some POC solutions. I present a lot of projects in various forums and seminars and there aren't a lot of credits and trial options with Databricks. Even if we want to explore, we're not able to and that's a challenge. The solution is quite expensive.

For how long have I used the solution?

I've been using this solution for a year. 

What do I think about the stability of the solution?

It's currently stable although we have not yet tested it with a huge volume of data. We'll focus on the performance and model serving capability in the near future. We're still carrying out performance testing, developing the models and figuring out the infrastructure.

What do I think about the scalability of the solution?

Scalability is quite good because we just used 128 GB of resources. It's quite easy to scale.

How was the initial setup?

It was relatively simple, we didn't face any challenges. Deployment takes around two days. 

Which other solutions did I evaluate?

We did a PSU in Azure ML Studio which is quite a good solution, easy to deploy and use. It's almost a no-code platform. We've also found Azure ML Studio to be quite cost-effective.

What other advice do I have?

I would recommend trying Databricks because it's cloud agnostic. A lot of customers currently use Azure but want to build something on their own down the track. Databricks makes that easy with its integration with other cloud customers. If somebody wants to build something on their infrastructure or their own virtual cloud, this is a good platform.

I rate the solution eight out of 10 because of the issue I'm having with a lack of trial options.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Enterprise Data Architect at a financial services firm with 51-200 employees
Real User
Assists with quickly computing a considerable amount of historical data and helps us with data ingestion
Pros and Cons
  • "Its lightweight and fast processing are valuable."
  • "The Databricks cluster can be improved."

What is our primary use case?

Our primary use case for this solution is for data ingestion and the DQ rules we are implementing. We deploy the solution on Azure cloud.

How has it helped my organization?

Whenever we send data to downstream applications for creating a file, multiple business rules are involved, and this solution assists with quickly computing a considerable amount of historical data.

What is most valuable?

Its lightweight and fast processing are valuable.

What needs improvement?

The product could include some UI features to improve the ease of use, like drag and drop for a few aggregated functions. Additionally, the Databricks cluster can be improved.

For how long have I used the solution?

We have been using Databricks for approximately two years and are currently using the latest version.

What do I think about the stability of the solution?

The solution is very stable. However, sometimes it intermittently restarts. I rate the stability an eight out of ten.

What do I think about the scalability of the solution?

The solution is scalable, and we are trying to implement more use cases with Databricks in our organization as we advance. I rate the scalability an eight out of ten.

How are customer service and support?

I rate customer service and support a nine out of ten.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup was not very complex. We deploy the solution manually and the time required depends on the complexity of the business logic. I rate it an eight out of ten.

What about the implementation team?

We implemented the solution through an in-house team.

What other advice do I have?

I rate the solution an eight out of ten.

Disclosure: My company has a business relationship with this vendor other than being a customer: Gold Partners
PeerSpot user
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2024
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.