it_user1050483 - PeerSpot reviewer
CEO at Inosense
Real User
Great for dealing with huge amounts of data and it is easy to connect to different sources of data
Pros and Cons
  • "We are completely satisfied with the ease of connecting to different sources of data or pocket files in the search"
  • "The integration features could be more interesting, more involved."

What is our primary use case?

Our primary use case is really DevOps, for integration and continuous development. We've combined our database with some components from Azure to deploy elements in Sandbox for our data scientists and for our data engineers. 

What is most valuable?

Valuable features would have to include the Notebook for piping some models and the future of executing the notebooks in parallel, in batches, which is also something that we use. And we use the Notebook on Spark with Python. 

What needs improvement?

Improvements could include the pricing, the product is a little expensive, although I think comparable to other similar options. The integration features could be more interesting, more involved. For example, we use the Database Notebook, which is not as great as Jupyter Notebook, for providing a great user experience. The look and feel are not the same and we've had complaints from some of our users. They say that it's easier and more productive for them to use Jupyter Notebook.

And then there is the integration feature for connecting to data sources, for example, Jupyter Notebook through publishes connect. The problem is that when you do that, you don't get all the Jupyter features which is a shame for us. 

For additional features, having some PyTorch or TensorFlow type features inside would definitely be great. For now, my users are developing for themselves by importing their libraries into their Notebook and then creating models based on the potential flow of PyTorch. That requires a lot of imports, particularly library imports, something that is now available in the new version of  Machine Learning services. These things are very important because the self appliance community has shifted from the traditional way of preparing models, to a deeper learning system. It's now more common to have those features. 

For how long have I used the solution?

I've been using the product inside Azure for about six months now. 

Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.

What do I think about the stability of the solution?

Given my experience, the product is very stable. 

What do I think about the scalability of the solution?

The product is quite easy to scale and increasing the number of users is quite simple. 

Which solution did I use previously and why did I switch?

We previously used the earlier version of Azure Machine Learning services and we decided to move over because over time it became more difficult to deploy. That was two years ago, but now with the new version, it's much easier to deploy Machine Learning.

How was the initial setup?

The setup is straightforward, I did it myself. 

What other advice do I have?

The product has improved and I'm sure this will continue in the next versions. We are completely satisfied with it, the ease of connecting to different sources of data or pocket files in the search. 

I think it could be very interesting for users looking for a framework to use Databricks. I would, however, recommend a more complicated architecture for using Databricks and achieving a great result for end-users. 

I would rate this product an eight out of 10. 

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Data Scientist at a retailer with 5,001-10,000 employees
Real User
Quick development, reliable, has interactive clusters, and is priced per usage
Pros and Cons
  • "One of the features provides nice interactive clusters, or compute instances that you don't really need to manage often."
  • "I would like to see more documentation in terms of how an end-user could use it, and users like me can easily try it and implement use cases."

What is our primary use case?

Currently, I am using this solution for a forecasting project.

What is most valuable?

One of the features provides nice interactive clusters, or compute instances that you don't really need to manage often. You can just spin it off and use that for a lot of your pre-processing, which is very convenient. 

The normal features are very good in terms of doing some quick development or doing some EDA.

Also, one of the newest features brought into this solution provides you with a way to solve, deploy, and train models using the platform itself. Or, it can connect to your Azure Machine Learning in order to train, deploy, and productionalize some of the machine learning models.

What needs improvement?

Since the Databricks community is not that old, there is not a lot of information about some of the issues that we face. We have to go back to the Databricks stream to get some of the issue resolutions from there. 

As time passes, and more people start putting more information out there about this technology, wit will be helpful.

I think even with the features that we currently have, they're still optimizing some of the clusters and trying to parallelize to better read from other types of data. So, that's going really well in terms of one of the features that they recently came up with to include the data format for data, which was really good, and that speeds up a lot of the processes.

I would like to see more documentation in terms of how an end-user could use it, and users like me can easily try it and implement use cases.

For how long have I used the solution?

I have been using Databricks on a daily basis for over a year.

It's deployed on the cloud, so it's always up to date.

What do I think about the stability of the solution?

It's definitely quite stable, in terms of an enterprise solution. 

I'd say that it's pretty stable. 

You have these clusters running on-demand, and you can also come up with these clusters that are scheduled, and that can be run for your production jobs.

What's my experience with pricing, setup cost, and licensing?

The pricing depends on the usage itself. They measure the cost of the companies in town. It also depends on the type of cluster that you are using. If you are using a very heavy cluster, it would be the price per CPU.

What other advice do I have?

I would rate Databricks an eight out of ten.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.
Data Scientist at a energy/utilities company with 10,001+ employees
Real User
Has a good feature set but it needs samples and templates to help invite users to see results
Pros and Cons
  • "Imageflow is a visual tool that helps make it easier for business people to understand complex workflows."
  • "The product needs samples and templates to help invite users to see results and understand what the product can do."

What is our primary use case?

I am a data scientist here and that is my official role. I own the company. Our team is quite small at this point. We have around five people on the team and we are working with about five different businesses. The projects we get from them are massive undertakings. Each of us on the team takes multiple roles in our company and we use multiple tools to help best serve our clients. We are trying to look at creative ways that different solutions can be integrated and we try to understand what products we can use to create solutions for client companies that will be effective in meeting their needs.  

We are personally using Databricks for certain projects where we want to consider creating intelligent solutions. I have been working on Databricks as part of my role in this company, trying to see if there are any kind of standard products that we can use with it to create solutions. We know that Databricks integrates with Airflow, so that is something that we are exploring right now as a potential solution for enabling a creative response. We are exploring the cloud as an option. Databricks is available in Azure and we are currently figuring out the viability of using that as a cloud platform. So we are exploring the way Databricks and Azure integrate at the same time to give us this type of flexibility.  

What we use it for right now is more like asset management. If we have a lot of assets and we get a lot of real-time data, we certainly want to do some processing on some of this data, but you do not want to have to work on all of it in real-time. That is why we use Databricks. We push the data from Azure through Databricks and work on the data algorithm in Databricks and execute it from Azure with probably an RPA (Robotic Process Automation) or something of that sort. It intelligently offloads real-time processing.  

What is most valuable?

Of the available feature set, I like the Imageflow feature a lot. It is very interesting. It gives me clarity on the execution of a process. I can draw the complete flow from start to finish in the exact way that I want it to execute. It is more visual and it is also easier for the people in businesses where I make presentations to understand.  

When I demonstrate a process to a business and show them the approach I am taking using code and technical language, then of course not many are going to understand that. But when I show them the process in terms of the graphical layout Imageflow helps provide, then they will be able to understand it much easier. They understand why I am choosing a particular way of executing the process and why I am taking certain steps in the way I have chosen to do it. The point is to help other people understand the solution more clearly.  

What needs improvement?

I think the automatic categorization of variables needs to be improved. The current functionality is not always efficiently identifying the features of the data that is collected. Probably that is the only thing I can think of. Apart from that, I have not explored the product enough yet to go into more depth because there is only one asset project that I have taken on right now. Because I own this company, I have been doing more to run it than to explore this product very deeply. But when you get any form of data inside there, if it could understand what type of variables there are and what features the data has, it would help massively in taking processing to the next step. If it does not exactly identify the variables you may have to modify them a little. Apart from working with Databricks to understand its capabilities, I am also trying to learn Apache Spark right now. Some members of my team want to work with Apache Spark as a solution and at this point, we are evaluating both and we are planning to use Spark or Databricks.  

As far as what might be added, some custom algorithm samples would be useful. All of the other products of this type — Azure, AWS, SageMaker — they all have customizable algorithms. You have the capability to implement a sort of workflow from that by modifying things in the sample and changing it to fit your purposes. Probably that is something that might help in doing some small NDP (Near-Data Processing) development. It might not help in the project directly, but it will help while we work on some NDP development of our own so that we can quickly evaluate how something is going to work. Templates or other samples could make working on things easier.  

That would also help massively in getting people to understand the potential of what the product can actually do. But I also think not many people would strongly agree with this. Many people go to the first solution they can think of that they know very well already in the IT field even if they could imagine that something could be better.  

To get the value out of this technology, people will need to come to accept it. Technical people will accept Databricks more if they understand that this is something that they can use and start working on without a lot of experience. Adopting it will take time for new users who have no experience. But to feel like they can have success with a product, they have to execute something in a very short time and see how it can work. When you talk about AI — or really when you talk about anything new — people do not initially want to invest the time in discovery. These processes do take time to learn, but with templates or samples, you get to see immediately what the possibilities are and what you might get out of it. Then when they try something of their own and are able to get it working in less than a week's time, they will be encouraged to look into the product and the technology some more.  

For how long have I used the solution?

We have been using the Databricks product for approximately three months.  

What do I think about the stability of the solution?

It is very hard to comment on the stability right now. We will need more time to experience the product in actual usage to render any opinions about stability accurately at that level.  

What do I think about the scalability of the solution?

We have not really gotten to the point of scaling and testing scalability at this point. We only have two people involved with the product. One is a data scientist and one is a data engineer.  

How was the initial setup?

The initial setup was not complex at all. The documentation is good. It is clear and not very difficult to understand. Because the documentation is good, the installation is fine.  

We did the implementation by ourselves — within our team and with the help of the documentation. But I would not say that we have already deployed the model yet. This is an ongoing process, as there are certain inputs that changed over time.  

So we have not implemented the product completely, but we have gotten to advance with the product and our understanding of it. It is good, but our company is still trying to get much better data from it. At this point, it is like the data is just junk and more junk. So we are now working toward that goal of improving the result. Whenever the data result gets better, we'll try to implement the workflow to see how it performs. I would say it will probably take two to three months more before we actually get good data.  

Which other solutions did I evaluate?

I did have some experience with SageMaker before looking at Databricks, but apart from we have not been looking into any of the other solutions that are available. We were just exploring a few of the different solutions that the members of the team already have experience with. Most of the team came to our company with some experience using Azure, and most of them came with experience in EBS (Elastic Block Store) and some of them come with experience on various other platforms. We wanted to mine that knowledge and just explore some of these possibilities to see which one works with all of us as a team.  

What other advice do I have?

On a scale from one to ten where one is the worst and ten is the best, I would rate Databricks overall as around a 7 or 7.5. If we had more experience with it and could be sure we had a solid understanding of what it could do and the reliability, I might recommend it with a better score. I do not think I should give it more than a seven for now.  

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Data engineer
Real User
A stable solution that can be scaled depending on the project, but the price could be cheaper
Pros and Cons
  • "The setup was straightforward."
  • "The pricing of Databricks could be cheaper."

What is our primary use case?

I primarily use the solution in two conditions: machine learning and big data computing.

What needs improvement?

The pricing of Databricks could be cheaper. The solution can also improve by providing more intelligence to the coder.

For how long have I used the solution?

I have been using Databricks for the past two years.

What do I think about the stability of the solution?

The solution is stable. I would rate the stability a seven out of ten.

What do I think about the scalability of the solution?

The scalability depends on the project. At present, around 20 people use the solution in my company.

How are customer service and support?


How was the initial setup?

The setup was straightforward. It also depends on the projects.

What about the implementation team?

The deployment process was automated.

Which other solutions did I evaluate?

Evaluating solutions is not my work. I depend on Databricks.

What other advice do I have?

I rate Databricks a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Mullai Selvan - PeerSpot reviewer
Project Manager at MAQ Software
Real User
Top 20
Integrates well, is scalable, and high availability
Pros and Cons
  • "The most valuable feature of Databricks is the integration with Microsoft Azure."
  • "Databricks can improve by making the documentation better."

What is our primary use case?

I am using Databricks for creating business intelligence solutions.

What is most valuable?

The most valuable feature of Databricks is the integration with Microsoft Azure.

What needs improvement?

Databricks can improve by making the documentation better.

For how long have I used the solution?

I have been using Databricks for approximately one year.

What do I think about the stability of the solution?

Databricks is stable.

What do I think about the scalability of the solution?

The scalability of Databricks is good.

We have approximately 500 users using this solution in my organization.

How are customer service and support?

I have not used the support from Databricks.

Which solution did I use previously and why did I switch?

We previously used Microsoft stacks. We chose Databricks because the processing power was better and it was a better fit for our use case.

How was the initial setup?

The initial setup of Databricks was not straightforward. We had to do trial and error and we learned as we went along.

I rate the initial setup of Databricks a four out of five.

What about the implementation team?

We did the implementation of Databricks in-house. The solution requires ongoing maintenance.

What other advice do I have?

I would recommend this solution to others.

My advice to others is for them to first do a small proof of concept and then see how it works out and then take it from there.

I rate Databricks an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Natalia  Raffo - PeerSpot reviewer
Co - Founder & Chief Data Officer -CDO at Data360
Real User
Allows us to automate the creation of a cluster, optimized for machine learning, and construct AI machine learning models for the client
Pros and Cons
  • "Databricks allows me to automate the creation of a cluster, optimized for machine learning and construct AI machine learning models for the client."
  • "There could be more support for automated machine learning in the database. I would like to see more ways to do analysis so that the reporting is more understandable."

What is our primary use case?

I use this for database machine learning, to construct different models for supermarkets, drug store management, and market involvement to identify business opportunities for clients.

We provide different statistical models and use different algorithms depending on the client.

I was a Lead Data Scientist in different companies. I implement data and build and optimize processes using machine learning techniques, aided by science and advanced analytics.

What is most valuable?

Databricks allows me to automate the creation of a cluster, optimized for machine learning and construct AI machine learning models for the client.

What needs improvement?

There could be more support for automated machine learning in the database. I would like to see more ways to do analysis so that the reporting is more understandable.

What do I think about the stability of the solution?

It's stable.

What do I think about the scalability of the solution?

It's scalable.

How are customer service and support?

I would rate technical support 4 out of 5.

How was the initial setup?

Setup isn't difficult. We used about 15 people for deployment and maintenance. We have data scientists and statisticians using this solution and doing different analyses.

What other advice do I have?

I would rate this solution 9 out of 10.

My advice is to use the different high analytics methodology, plan for the project, and recognize the different activities for the design.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor. The reviewer's company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Business Intelligence Coordinator Latam at a construction company with 5,001-10,000 employees
Real User
The capacity of use of the different types of coding is valuable
Pros and Cons
  • "The capacity of use of the different types of coding is valuable. Databricks also has good performance because it is running in spark extra storage, meaning the performance and the capacity use different kinds of codes."
  • "There would also be benefits if more options were available for workers, or the clusters of the two points."

What is our primary use case?

My company is a customer of Databricks. We use Data Science products for machine learning, engineering, and data preparation.

We have between five and eight people working on coding in Databricks. Indirectly, we have 1500 people consuming the data. We have plans to increase the usage of data bricks by 30% next year.

What is most valuable?

The capacity of use of the different types of coding is valuable. Databricks also has good performance because it is running in spark extra storage, meaning the performance and the capacity use different kinds of codes.

What needs improvement?

Databricks does not always have clear updates. Often we find an update in the tool but we are not really sure what has changed. We would appreciate better communication from Databricks. It could be in the form of a friendly warning that talks about the updates. 

There would also be benefits if more options were available for workers, or the clusters of the two points.

For how long have I used the solution?

I have been using Databricks for two years.

What do I think about the stability of the solution?

Databricks is stable, however, we do find some errors and don't understand what has happened. Usually, they are resolved within a few minutes. I would say it is 95% stable.

What do I think about the scalability of the solution?

Scalability is really good.

How are customer service and support?

I have not had to contact Databrick's support other than through the deployment, which they helped a lot. 

How was the initial setup?

The initial setup of Databricks is straightforward and simple. It is not complex because they provide a lot of documentation. The deployment was fast, it took less than three days with five people assigned to the task.

What about the implementation team?

We implemented in-house. It is difficult to find a good consultant or reseller for Databricks in Brazil.

What's my experience with pricing, setup cost, and licensing?

We pay monthly on a pay as you go plan.

What other advice do I have?

With Databricks, you may have a lot of devices. It is important to use each cluster for each kind of process and then not use the small clusters. Using the bigger cluster you will receive better performance and the use is closer and will save you money. 

It is important to code it in parts because if you code it all in full you could find some problems with performance.

I would rate Databricks a 9 out of 10.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Chief Research Officer at a consumer goods company with 1,001-5,000 employees
Real User
Ability to work collaboratively without concerns regarding the infrastructure is very beneficial to us
Pros and Cons
  • "Ability to work collaboratively without having to worry about the infrastructure."
  • "Would be helpful to have additional licensing options."

What is our primary use case?

Our primary use case of Databricks is for advanced analytics. I'm the chief research officer of the company and we're customers of Databricks.  

What is most valuable?

I think the features I like the most are the scalability of the solution as well as its ability to share. We work with multiple people on notebooks and it enables us to work collaboratively in an easy way without having to worry about the infrastructure. I think the solution is very intuitive, very easy to use. And that's what you pay for.

What needs improvement?

I'd like to see more licensing options for the solution, the availability of additional pricing tiers. I understand it's not easy to achieve because it's a kind of platform-as-a-service type of solution. If you wanted to be more specific about the parts, and what you might or might not need, then you could save some money, and go for a lower level. Of course, that would then mean you'd have to manage more configurations which, as a user, would make things more complex but it would be good to have that option. The pricing is not the cheapest but it's understandable because it's a very high-end solution and easy to use, there's a lot of complexity masked away.

I would like to see additional monitoring tools and, in general, anything that can improve visualization of data. I know it's not the main point of Databricks and there are other tools that can be used, but anything that facilitates the integration of Databricks with visualization tools could be really useful. Increasing data scalability would also be great. 

For how long have I used the solution?

I've been using this solution for a year. 

What do I think about the stability of the solution?

The solution has been very stable. 

What do I think about the scalability of the solution?

Scalability of the solution seems very easy to achieve. 

How are customer service and technical support?

We haven't had contact with technical support. 

How was the initial setup?

The initial set was very straightforward because it's also in our Azure cloud so it was quite easy to set up and configure. Very intuitive.

What other advice do I have?

I would rate this solution an eight out of 10. 

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2024
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.