Data Science Developer at a tech services company with 501-1,000 employees
Real User
Good performance and support for big data, built-in machine learning libraries are powerful
Pros and Cons
  • "Databricks is based on a Spark cluster and it is fast. Performance-wise, it is great."
  • "It should have more compatible and more advanced visualization and machine learning libraries."

What is our primary use case?

We use this solution for streaming analytics. We use machine learning functions that output to the API and work directly with the database.

How has it helped my organization?

Prior to using Azure Databricks in the cloud, we had Databricks installed in clusters. Since our implementation, the performance has increased and our cost has been reduced.

What is most valuable?

Databricks is based on a Spark cluster and it is fast. Performance-wise, it is great.

This solution has very good machine learning libraries built-in.

The support for big data is good.

What needs improvement?

Databricks should have more libraries for predictive analysis and machine learning.

It should have more compatible and more advanced visualization and machine learning libraries. As it is now, I have to try a customer algorithm in order for things to be compatible.

I would like to see more deep learning analytics.

Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.

For how long have I used the solution?

I have been using Databricks for about one year.

What do I think about the stability of the solution?

This is a cluster-based solution, so it is stable.

What do I think about the scalability of the solution?

We started using Databricks with a small PoC application, and then we developed it into a larger one. It's scalable, and it's a simple process to scale.

We have eight people in our team who are using this solution. We do not plan to increase usage at this time.

How are customer service and support?

I did not contact technical support myself, but when one of our team members contacted them they were given good answers. I would say that the support is good.

How was the initial setup?

It is not difficult to deploy this solution because it is well documented. We followed the normal steps that included all of the APIs.

What's my experience with pricing, setup cost, and licensing?

I do not exactly know the costs, but one of our clients pays between $100 USD and $200 USD monthly.

What other advice do I have?

Databricks has been good and I like it. However, it would be improved with the enhancement of the machine learning libraries, and with the inclusion of visualization libraries.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Advanced Analytics Lead at a pharma/biotech company with 1,001-5,000 employees
Real User
Better tailored code and automation capabilities needed, but easy to use
Pros and Cons
  • "The solution is easy to use and has a quick start-up time due to being on the cloud."
  • "The solution could improve by providing better automation capabilities. For example, working together with more of a DevOps approach, such as continuous integration."

What is our primary use case?

Databricks can be used for large-scale data pre-processing and data transformations.

What is most valuable?

The solution is easy to use and has a quick start-up time due to being on the cloud.

What needs improvement?

The solution could improve by providing better automation capabilities. For example, working together with more of a DevOps approach, such as continuous integration. There is a lot of code from places, such as GitHub, but it is not tailored for Databricks. It requires a lot of effort to bring the code to a level where it can be used with Databricks capabilities.

For how long have I used the solution?

I have been using Databricks for two months.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

Databricks is scalable.

How are customer service and technical support?

We did not have a need to use technical support.

How was the initial setup?

The installation is straightforward, and it took approximately one hour.

What about the implementation team?

We did the implementation and maintenance of the solution ourselves using approximately three engineers.

What's my experience with pricing, setup cost, and licensing?

The solution requires a subscription.

Which other solutions did I evaluate?

We are evaluating other solutions.

What other advice do I have?

I would recommend this solution for those wanting to process large data sets, but if it is to be used for smaller data sets, I would not recommend it.

I rate Databricks a five out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.
Business Intelligence and Analytics Consultant at a tech services company with 201-500 employees
Consultant
Easy to switch loads between clusters and automation is easy using the API
Pros and Cons
  • "Automation with Databricks is very easy when using the API."
  • "Some of the error messages that we receive are too vague, saying things like "unknown exception", and these should be improved to make it easier for developers to debug problems."

What is our primary use case?

I am a developer and I do a lot of consulting using Databricks.

We have been primarily using this solution for ETL purposes. We also do some migration of on-premises data to the cloud.

What is most valuable?

The most valuable feature is the ability to switch loads between multiple clusters.

Automation with Databricks is very easy when using the API.

The ability to write code and SQL in the same interface is useful.

It is easy to connect notebooks to a cluster.

There are a large number of inbuilt functions that help to make things easier.

What needs improvement?

Some of the error messages that we receive are too vague, saying things like "unknown exception", and these should be improved to make it easier for developers to debug problems. As it is now, we have to go into the driver logs to identify the error messages properly. 

There is not much information about Databricks available online, such as cost. Whenever we want to find the actual costing, we have to send an email to Databricks, so having the information available on the internet would be helpful.

I would like to see integration with Power BI or Tableau for the business users. They may use Databricks to check on things, but it will be a little bit complicated for them. The GUI interfaces for Tableau and Power BI are ones that they are used to, so the integration would help.

For how long have I used the solution?

I have been using Databricks for about five and a half years.

What do I think about the stability of the solution?

We have found that in the development environment, Databricks is pretty stable. We have had problems where something works in development but does not work in production, and this can happen when the version is updated and certain features have been deprecated. This means that more testing is required before moving to production, but this is the only drawback that we have seen.

Basically, when we move across version we have found issues, but otherwise, it's pretty stable.

What do I think about the scalability of the solution?

Scalability is one of the main features of Databricks. We have used datasets that are one hundred megabytes in size up to one terabyte, and we can manage, so it's easily scalable.

We have a large company with between 400 and 500 people using this solution.

How are customer service and technical support?

We have not reached out for technical support on Databricks.

How was the initial setup?

I found the initial setup easy because I had previously worked on Spark.

If somebody goes through the training, which is available on the website, then it should be straightforward. I don't think that it is very hard.

When it comes to developing things based on use cases, it can take between three days and two weeks, plus two to three days for testing and deploying it. I would say that for an entire use case, it will take a maximum of three weeks.

What other advice do I have?

My advice for developers who are interested in working with this solution is to first go through the Spark architecture.

I would rate this solution a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Data Architect at a tech services company with 201-500 employees
Real User
A reliable solution for processing and transforming data
Pros and Cons
  • "The fast data loading process and data storage capabilities are great."
  • "There are no direct connectors — they are very limited."

What is our primary use case?

We specialize in project consulting for our clients. Whenever we get the opportunity, we recommend Databricks to them.

What is most valuable?

The fast data loading process and data storage capabilities are great.

Based on the data loads and the performance, you can easily scale up the clusters.

What needs improvement?

Sometimes we experience issues connecting our database to Databricks. There are no direct connectors — they are very limited. This should be addressed and corrected in the next release.  

Reading past data can also be tricky as there is no data spectrum like you would find with Snowflake and other solutions. 

For how long have I used the solution?

We have been using Databricks for one and a half years.

What do I think about the scalability of the solution?

Both the scalability and the stability of Databricks is good.

How are customer service and technical support?

Technical support is good but I have not interacted with them directly. We have a point of contact. We used to interact with tech support on a regular basis and they would respond quickly. We would get a response on the same day based on the priority level. Keep in mind, my company is in a partnership with them which could be a factor in their quick response time.


How was the initial setup?

The initial setup was not very complex. We had it up and running in no time; it's a quick process.

What about the implementation team?

We have just one solution architect and one data architect who handle all maintenance-related issues. 

What other advice do I have?

I would recommend purchasing a package that includes technical support. Compared to other companies, they offer great support to their clients.

On a scale from one to ten, I would give Databricks a rating of eight.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Director of Data (Engineering & Science) at a tech services company with 11-50 employees
Real User
Top 20
An easy-to-use solution useful to run patch jobs
Pros and Cons
  • "The ease of use and its accessibility are valuable."
  • "The integration and query capabilities can be improved."

What is our primary use case?

Our primary use case for the solution is to run batch jobs.

What is most valuable?

The ease of use and its accessibility are valuable.

What needs improvement?

The solution can be improved by expanding its integration capabilities and providing the ability to query external vendors directly.

For how long have I used the solution?

We have been using the solution for a little less than a year, and we deploy it on the Amazon cloud.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

The solution is scalable, and there are approximately seven developers and two DevOps employees utilizing the solution.

How are customer service and support?

We have had a good experience with customer service and support. I rate them a nine out of ten.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup for the solution is a bit complex.

What's my experience with pricing, setup cost, and licensing?

I wouldn't consider it a costly solution. Like all other solutions, it depends on how you use them. If you provision sparked clusters much larger than what you need, it becomes costly. For example, it is not more costly than EMR, the AWS equivalent, and from my perspective, it is much better.

What other advice do I have?

I rate the solution a nine out of ten. The solution is good, but the integration and query capabilities can be improved.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Big Data and Cloud Architect at a computer software company with 201-500 employees
Real User
Top 20
Excellent workspace and notebooks
Pros and Cons
  • "Databricks' most valuable features are the workspace and notebooks. Its integration, interface, and documentation are also good."
  • "Databricks' technical support takes a while to respond and could be improved."

What is our primary use case?

I primarily use Databricks for data pipelines.

What is most valuable?

Databricks' most valuable features are the workspace and notebooks. Its integration, interface, and documentation are also good.

For how long have I used the solution?

I've been working with Databricks for around five years.

What do I think about the stability of the solution?

Databricks is stable.

What do I think about the scalability of the solution?

Databricks is scalable.

How are customer service and support?

Databricks' technical support takes a while to respond and could be improved.

How was the initial setup?

The initial setup was easy.

What's my experience with pricing, setup cost, and licensing?

Databricks' cost could be improved.

What other advice do I have?

I would give Databricks a rating of eight out of ten.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Head of Data & Analytics at a tech services company with 11-50 employees
Real User
Helpful integration with Python and notebooks, but it should be more user-friendly and less complicated to use
Pros and Cons
  • "The integration with Python and the notebooks really helps."
  • "Databricks is not geared towards the end-user, but rather it is for data engineers or data scientists."

What is our primary use case?

We are a consulting house and we employ solutions based on our customers' needs. We don't generally use products internally.

I am a certified data engineer from Microsoft and have worked on the Azure platform, which is why I have experience with Databricks. Now that Microsoft has launched Synapse, I think that there will be more use cases.

What is most valuable?

You can spin up an Azure Databricks clustered, and integrating with it is seamless.

The integration with Python and the notebooks really helps.

What needs improvement?

There is definitely room for improvement.

This is the type of solution where you need to have people with technical expertise to use it.  Other products are self-service and can be employed by end-users. Databricks is not geared towards the end-user, but rather it is for data engineers or data scientists. I'm not sure whether Databricks is working towards it, or not.

It would be nice if it were more user-friendly, where you don't have to rely on Power BI or a visualization tool. I know that there is integration in the notebook where you can do it, but still, the relationships and semantics make it more difficult. It would be better to do it right in Databricks. You could put them within the portal and I don't have to log out and bring that into Power BI and then visualize.

What do I think about the stability of the solution?

We have not done any major implementation yet, although I think it's stable to an extent. I can't comment on it in terms of benchmark and experiencing any issues. It works seamlessly in the places where I've used it.

What do I think about the scalability of the solution?

Our implementations have been small and we haven't needed to scale as of yet. 

Databricks can help you to build a data lake, and it's something that they need to help make more popular. People are slowly understanding it because if you look, there are lots of data lakes that people are trying to create. I'm not intimate with it, but the concept seems complicated. I think they need to write up something where videos can explain it better. What I have seen on YouTube is quite complicated for an end-user to understand.

How was the initial setup?

The initial setup is easy. It's not difficult when you are used to Azure.

What's my experience with pricing, setup cost, and licensing?

I am based in South Africa, where it is expensive adapting to the cloud, and then there is the price for the tool itself. 

The cost is difficult to estimate. I've got customers who went to the cloud and then they realized that the costs were more, compared to what they used to be on-premises. Also, because our exchange rate is so weak, I would always advocate that prices being lower is better, although I don't know how feasible it is.

What other advice do I have?

From a purely technical perspective, I would rate Databricks and eight out of ten. However, there is a failure in terms of user adoption. After I look at other products, including Synapse, those are better. I still feel that Databricks is quite complicated for the average person.

I would rate this solution a five out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Engineer at a tech services company with 10,001+ employees
Real User
An easy initial setup with a good time travel feature, but needs better model scoring
Pros and Cons
  • "The time travel feature is the solution's most valuable aspect."
  • "Databricks is an analytics platform. It should offer more data science. It should have more features for data scientists to work with."

What is our primary use case?

We use the solution for multiple items. We use lots of data crunching, development, and algorithms on it.

What is most valuable?

The time travel feature is the solution's most valuable aspect.

What needs improvement?

The management of the solution needs to be modernized. Managing the radius data is hard.

The solution requires modern scoring. There's not a good way of knowing how the models are performing from a data science perspective. The solution needs more model scoring abilities. It doesn't necessarily need more model monitoring, but more model scoring and performance from a data science perspective. 

Databricks is an analytics platform. It should offer more data science. It should have more features for data scientists to work with.

For how long have I used the solution?

I've been using the solution for one year so far.

What do I think about the stability of the solution?

The solution is not exactly stable. We've faced a few bugs which have really affected it. There are bugs especially when it comes to connecting with Spark.

What do I think about the scalability of the solution?

It's hard to say how scalable the solution is. The scalability comes into play on the Spark side, not on the Databricks side.

We have about 20 people on the solution right now.

How are customer service and technical support?

We've never been in touch with technical support, so I don't have any experience in terms of dealing with them.

How was the initial setup?

The initial setup is straightforward. I wouldn't say that it's complex in any way.

Deployment times vary and really depend on multiple factors. It can take anywhere from a few weeks to a few months to deploy the solution. In our case, it took us about three months to fully deploy it.

It takes two to three people to deploy the solution.

What about the implementation team?

I deployed the solution with the help of my team.

What's my experience with pricing, setup cost, and licensing?

I'm not sure what the licensing costs are on the solution.

Which other solutions did I evaluate?

We did evaluate Amazon PageMaker before ultimately choosing Databricks. It's the only other solution we evaluated at the time.

What other advice do I have?

We're partners with Databricks.

We're using the latest version of the solution, but I can't recall what version number we are on.

I'd advise others considering the solution to look at usage. They shouldn't adopt the solution blindly. How the implementation and usage will go will depend on the skill of the data engineer and what your requirements are.

I'd rate the solution seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2024
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.