I am a developer and I do a lot of consulting using Databricks.
We have been primarily using this solution for ETL purposes. We also do some migration of on-premises data to the cloud.
I am a developer and I do a lot of consulting using Databricks.
We have been primarily using this solution for ETL purposes. We also do some migration of on-premises data to the cloud.
The most valuable feature is the ability to switch loads between multiple clusters.
Automation with Databricks is very easy when using the API.
The ability to write code and SQL in the same interface is useful.
It is easy to connect notebooks to a cluster.
There are a large number of inbuilt functions that help to make things easier.
Some of the error messages that we receive are too vague, saying things like "unknown exception", and these should be improved to make it easier for developers to debug problems. As it is now, we have to go into the driver logs to identify the error messages properly.
There is not much information about Databricks available online, such as cost. Whenever we want to find the actual costing, we have to send an email to Databricks, so having the information available on the internet would be helpful.
I would like to see integration with Power BI or Tableau for the business users. They may use Databricks to check on things, but it will be a little bit complicated for them. The GUI interfaces for Tableau and Power BI are ones that they are used to, so the integration would help.
I have been using Databricks for about five and a half years.
We have found that in the development environment, Databricks is pretty stable. We have had problems where something works in development but does not work in production, and this can happen when the version is updated and certain features have been deprecated. This means that more testing is required before moving to production, but this is the only drawback that we have seen.
Basically, when we move across version we have found issues, but otherwise, it's pretty stable.
Scalability is one of the main features of Databricks. We have used datasets that are one hundred megabytes in size up to one terabyte, and we can manage, so it's easily scalable.
We have a large company with between 400 and 500 people using this solution.
We have not reached out for technical support on Databricks.
I found the initial setup easy because I had previously worked on Spark.
If somebody goes through the training, which is available on the website, then it should be straightforward. I don't think that it is very hard.
When it comes to developing things based on use cases, it can take between three days and two weeks, plus two to three days for testing and deploying it. I would say that for an entire use case, it will take a maximum of three weeks.
My advice for developers who are interested in working with this solution is to first go through the Spark architecture.
I would rate this solution a nine out of ten.
Our primary use case is really DevOps, for integration and continuous development. We've combined our database with some components from Azure to deploy elements in Sandbox for our data scientists and for our data engineers.
Valuable features would have to include the Notebook for piping some models and the future of executing the notebooks in parallel, in batches, which is also something that we use. And we use the Notebook on Spark with Python.
Improvements could include the pricing, the product is a little expensive, although I think comparable to other similar options. The integration features could be more interesting, more involved. For example, we use the Database Notebook, which is not as great as Jupyter Notebook, for providing a great user experience. The look and feel are not the same and we've had complaints from some of our users. They say that it's easier and more productive for them to use Jupyter Notebook.
And then there is the integration feature for connecting to data sources, for example, Jupyter Notebook through publishes connect. The problem is that when you do that, you don't get all the Jupyter features which is a shame for us.
For additional features, having some PyTorch or TensorFlow type features inside would definitely be great. For now, my users are developing for themselves by importing their libraries into their Notebook and then creating models based on the potential flow of PyTorch. That requires a lot of imports, particularly library imports, something that is now available in the new version of Machine Learning services. These things are very important because the self appliance community has shifted from the traditional way of preparing models, to a deeper learning system. It's now more common to have those features.
I've been using the product inside Azure for about six months now.
Given my experience, the product is very stable.
The product is quite easy to scale and increasing the number of users is quite simple.
We previously used the earlier version of Azure Machine Learning services and we decided to move over because over time it became more difficult to deploy. That was two years ago, but now with the new version, it's much easier to deploy Machine Learning.
The setup is straightforward, I did it myself.
The product has improved and I'm sure this will continue in the next versions. We are completely satisfied with it, the ease of connecting to different sources of data or pocket files in the search.
I think it could be very interesting for users looking for a framework to use Databricks. I would, however, recommend a more complicated architecture for using Databricks and achieving a great result for end-users.
I would rate this product an eight out of 10.
I primarily use the solution in two conditions: machine learning and big data computing.
The pricing of Databricks could be cheaper. The solution can also improve by providing more intelligence to the coder.
I have been using Databricks for the past two years.
The solution is stable. I would rate the stability a seven out of ten.
The scalability depends on the project. At present, around 20 people use the solution in my company.
The setup was straightforward. It also depends on the projects.
The deployment process was automated.
Evaluating solutions is not my work. I depend on Databricks.
I rate Databricks a seven out of ten.
Our primary use case for this solution is for data ingestion and the DQ rules we are implementing. We deploy the solution on Azure cloud.
Whenever we send data to downstream applications for creating a file, multiple business rules are involved, and this solution assists with quickly computing a considerable amount of historical data.
Its lightweight and fast processing are valuable.
The product could include some UI features to improve the ease of use, like drag and drop for a few aggregated functions. Additionally, the Databricks cluster can be improved.
We have been using Databricks for approximately two years and are currently using the latest version.
The solution is very stable. However, sometimes it intermittently restarts. I rate the stability an eight out of ten.
The solution is scalable, and we are trying to implement more use cases with Databricks in our organization as we advance. I rate the scalability an eight out of ten.
I rate customer service and support a nine out of ten.
Positive
The initial setup was not very complex. We deploy the solution manually and the time required depends on the complexity of the business logic. I rate it an eight out of ten.
We implemented the solution through an in-house team.
I rate the solution an eight out of ten.
Databricks is the full data analytics platform. It involves end to end data analytics process.
Databricks covers end-to-end data analytics workflow in one platform, this is the best feature of the solution.
Databricks could improve in some of its functionality.
I have been using Databricks for approximately a year and a half.
Databricks is very stable.
The scalability of Databricks is good.
We have 30 to 40 people are using this solution in my company.
I rate Databricks a nine out of ten.
We use the product as a data science platform that enables me to handle and analyze large datasets efficiently.
Databricks can switch easily between cloud providers, such as Azure and GCP. It allows seamless integration with various data platforms and cloud providers, facilitating better data handling and analysis.
The product could be improved regarding the delay when switching to higher-performing virtual machines compared to other platforms like Snowflake. The ease and speed of managing clusters can also be enhanced, especially when scaling up resources. They could add more advanced data storage solutions like Iceberg and Delta files.
I have been using Databricks for approximately two years.
I rate the product stability a seven out of ten.
I rate the product scalability an eight.
The technical support services are good.
Positive
The initial setup was straightforward. However, configuring policies could have been simpler.
The product pricing is moderate.
I evaluated other options, including Snowflake, before choosing Databricks.
Databricks is a robust solution for big data processing, offering flexibility and powerful features. While there are areas for improvement, especially in performance and cluster management, it remains a highly valuable tool in my data science toolkit.
I rate it a seven.
