Data Science Consultant at Syniti
Consultant
Good performance, easy to set up, and easy to use if you have a Python background
Pros and Cons
  • "I work in the data science field and I found Databricks to be very useful."
  • "It would be very helpful if Databricks could integrate with platforms in addition to Azure."

What is our primary use case?

We are building internal tools and custom models for predictive analysis. We are currently building a platform where we can integrate multiple data sources, such as data that is coming from Azure, AWS, or any SQL database. We integrate the data and run our models on top of that.

We primarily use Databricks for data processing and for SQL databases.

What is most valuable?

I found that PySpark is the most useful tool. It uses in-memory calculation and when you want to run a model it does it very quickly. We used to use Python and when we migrated to PySpark the performance was much better.

What needs improvement?

It would be very helpful if Databricks could integrate with platforms in addition to Azure.

Having an open-source version or having the option to get a trial version of Databricks would be very helpful.

It would be very useful for beginners if there were tutorials and examples on how to write code for PySpark, R, or Scala. Having examples would give people something to refer to and play with.

For how long have I used the solution?

We have been using Databricks for the past two or three years.

Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.

What do I think about the stability of the solution?

A couple of times I faced an issue where a long-running process was consuming a lot of time and then stopped abruptly. It necessitated starting the process again.

What do I think about the scalability of the solution?

We are in the prototyping stage so we do not plan on increasing our usage yet.

How are customer service and support?

We have not been in contact with technical support.

Which solution did I use previously and why did I switch?

Before using Databricks, we were running our own cluster with a web server that executed our Python queries.

How was the initial setup?

The initial setup is straightforward. With respect to deployment, the development can be done within half an hour and we can use code and deploy from there.

What about the implementation team?

We implemented Databricks on our own. We haven't deployed as such, as we are just running our queries and it is not in production yet.

What other advice do I have?

I work in the data science field and I found Databricks to be very useful. If I want to run any models then I can code them in PySpark. If you are coming from a Python background then you can write code in PySpark and it runs quickly. This is a good solution in terms of performance. 

I would rate this solution a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Machine Learning Engineer at a tech vendor with 51-200 employees
Real User
A convenient notebook, good stability, and a straightforward setup
Pros and Cons
  • "The most valuable aspect of the solution is its notebook. It's quite convenient to use, both terms of the research and the development and also the final deployment, I can just declare the spark jobs by the load tables. It's quite convenient."
  • "The solution could be improved by integrating it with data packets. Right now, the load tables provide a function, like team collaboration. Still, it's unclear as to if there's a function to create different branches and/or more branches. Our team had used data packets before, however, I feel it's difficult to integrate the current with the previous data packets."

What is our primary use case?

We primarily use the solution to run current jobs; to run the spark jobs as the current job.

What is most valuable?

The most valuable aspect of the solution is its notebook. It's quite convenient to use, both terms of the research and the development and also the final deployment, I can just declare the spark jobs by the load tables. It's quite convenient.

What needs improvement?

The solution could be improved by integrating it with data packets. Right now, the load tables provide a function, like team collaboration. Still, it's unclear as to if there's a function to create different branches and/or more branches. Our team had used data packets before, however, I feel it's difficult to integrate the current with the previous data packets.

The support could be improved a bit around the database. When we stream it to Data Lake, some data cannot be loaded. It should be a priority to fix this.

For how long have I used the solution?

I've been using the solution for half a year.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

The solution is scalable. However, it still needs us to manually set out the number of nodes in a cluster. It's really dependent on the application. Sometimes, when the tasks are bigger, and it gets a little difficult for us to define the number of nodes in a cluster. If the solution could allow users to set up the clusters, I think that'll be good.

Currently, we have three people using the solution. We may increase usage in the future.

How are customer service and technical support?

The technical support is quite good. In the beginning, when we had a few POC projects, they were very supportive.

Which solution did I use previously and why did I switch?

We didn't previously use a different solution, however, we built our own from scratch. This is the first unified platform that we've used.

How was the initial setup?

The initial setup is very straightforward. We just use their job functions. To deploy as a spark job is quite straightforward. 

In our use case, we also had some external databases to handle the deployment. For example, we only generated some prediction results. We saved the results into an external database. The solution takes time to deploy to the external database, but the spark job is quite easy.

What other advice do I have?

I'm a software development engineer. I'm working with the latest version.

As long as the developers have an understanding of spark, and understanding technical tricks, it's very fast in terms of using the database.

I'd rate the solution eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
769,662 professionals have used our research since 2012.
Chief Data Scientist at a tech services company with 11-50 employees
Real User
Effective integration, helpful support, and simple cloud implementation
Pros and Cons
  • "Databricks integrates well with other solutions."
  • "Databricks doesn't offer the use of Python scripts by itself and is not connected to GitHub repositories or anything similar. This is something that is missing. if they could integrate with Git tools it would be an advantage."

What is our primary use case?

We use Databricks for experimentation. For example, we do ML model building and training that is connecting to our data which resides in Azure. It offers very good integration with Azure. We've deployed some of our model inference tools in Databricks.

What is most valuable?

 Databricks integrates well with other solutions.

What needs improvement?

Databricks doesn't offer the use of Python scripts by itself and is not connected to GitHub repositories or anything similar. This is something that is missing. if they could integrate with Git tools it would be an advantage.

Along with having connections to different databases for Git tools, adding libraries for easy access would be a benefit. As data scientists, we connect to different databases and different sources of data, having a library would be useful.

For how long have I used the solution?

I have been using Databricks for approximately one year.

What do I think about the stability of the solution?

The solution is stable. We did not face any downtime.

What do I think about the scalability of the solution?

Databricks is scalable. It operates three times faster than any of the other ecosystems which we have experimented on.

We have approximately five data scientists using this solution in my organization. We are a small company and as we grow, all our data scientists would be using this platform. We plan to increase usage.

How are customer service and support?

The technical support is good. We didn't need a lot of support. There were a few times we needed some help on how to do certain operations.

How was the initial setup?

The installation was straightforward because it is on the cloud. The full deployment took approximately one week.

What about the implementation team?

We did the implementation of Databricks in-house. It only requires one person for the maintenance of the solution.

What other advice do I have?

My advice to others wanting to implement this solution is to use a cloud environment. For example, we are using Azure with Databricks. It is much better than doing an on-premise implementation.

I rate Databricks an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Technical Architect at a tech services company with 10,001+ employees
Real User
Facilitates robust solutions through collaboration but non-SQL users may struggle
Pros and Cons
  • "I like the ability to use workspaces with other colleagues because you can work together even without seeing the other team's job."
  • "Anyone who doesn't know SQL may find the product difficult to work with."

What is most valuable?

I like the ability to use workspaces with other colleagues because you can work together even without seeing the other team's job. So you can create a robust solution by working together with other professionals.

What needs improvement?

One area for improvement would be that anyone who doesn't know SQL may find the product difficult to work with. It would also be useful to have a remote support team inside Databricks, which would collect and analyze user feedback.

For how long have I used the solution?

I have been using Databricks since 2018.

How are customer service and support?

I had a little trouble with customer support but this was solved.

How was the initial setup?

The initial setup was a little complex because it was a new architecture for the customer, so there was nothing to compare it to in order to accelerate the project. This meant the deployment of the first project using Databricks took almost nine months and the second took almost a year.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Vice President, Business Intelligence and Analytics at a tech services company with 10,001+ employees
Real User
Stable cloud platform for data engineering and has a straightforward setup
Pros and Cons
  • "I haven't heard about any major stability issues. At this time I feel like it's stable."
  • "Pricing is one of the things that could be improved."

What is our primary use case?

We are still exploring the solution. We utilize it much, much better than their star schema models that they are trying to replace it with. We bring in Databricks and then see how they can leverage the additional analytical functionalities around the Databricks cloud. It's more in exploratory ways. We recommend Databricks, especially with the Azure cloud frameworks.

What needs improvement?

Pricing is one of the things that could be improved.

Also, there could be improvement in the visual analytics space there and on the machine learning functions. I haven't explored so I don't know about the functions and features that are there. If it is not there, then I think that's something which they should consider including.

For how long have I used the solution?

My team has been exploring Databricks for close to five or six months.

What do I think about the stability of the solution?

I haven't heard about any major stability issues. At this time I feel like it's stable.

What do I think about the scalability of the solution?

In terms of scalability, I think once we put it across for larger use-cases the scalability question will really arise. So we'll need detailed information. I assume that we will be able to scale up.

I think we do not have more than 10 people working on it now. Because we are in the earlier stages of implementation, it's more like a POC now. I really don't know whether it's been open for the larger audience yet.

How was the initial setup?

The initial setup was straightforward.

What about the implementation team?

It is better to be installed with the help of integrators, or consultants, or with an experienced team.

What other advice do I have?

It's more data scientists using Databricks. I would call them power users trying to see how they can get a hand on it, though they are not data scientists. They try to understand it a little bit better for their future use.

On a scale of one to ten, I would rate it an eight, easy. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Business Development Specialist at a tech services company with 51-200 employees
Real User
Top 20
Useful end-to-end data analytics, highly stable, and scalable
Pros and Cons
  • "Databricks covers end-to-end data analytics workflow in one platform, this is the best feature of the solution."
  • "Databricks could improve in some of its functionality."

What is our primary use case?

Databricks is the full data analytics platform. It involves end to end data analytics process.

What is most valuable?

Databricks covers end-to-end data analytics workflow in one platform, this is the best feature of the solution.

What needs improvement?

Databricks could improve in some of its functionality.

For how long have I used the solution?

I have been using Databricks for approximately a year and a half.

What do I think about the stability of the solution?

Databricks is very stable.

What do I think about the scalability of the solution?

The scalability of Databricks is good.

We have 30 to 40 people are using this solution in my company.

What other advice do I have?

I rate Databricks a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Partners
PeerSpot user
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2024
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.