2019-12-23T07:05:00Z

What is your experience regarding pricing and costs for Apache Spark?

Miriam Tover - PeerSpot reviewer
  • 1
  • 1033
PeerSpot user
18

18 Answers

Suriya Senthilkumar - PeerSpot reviewer
Real User
2024-02-26T16:01:50Z
Feb 26, 2024

They provide an open-source license for the on-premise version. However, we have to pay for the cloud version including data centers and virtual machines.

Search for a product comparison
Hamid M. Hamid - PeerSpot reviewer
Real User
Top 5Leaderboard
2024-02-05T09:17:45Z
Feb 5, 2024

Apache Spark is an open-source tool. It is not an expensive product.

SS
Real User
Top 5Leaderboard
2023-12-06T10:45:56Z
Dec 6, 2023

It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project. If I propose using Spark for a project, one of the first questions I get from management is about the cost of Databricks Spark on the cloud platform we're using, whether it's Azure, GCP, or AWS. If we could reduce the collection, system conversion, and transformation network costs by even just 2% to 3%, it would be a significant benefit for us.

NB
MSP
Top 5
2023-11-10T13:04:33Z
Nov 10, 2023

It is an open-source solution, it is free of charge.

Jagannadha Rao - PeerSpot reviewer
Real User
Top 10
2023-10-20T07:41:27Z
Oct 20, 2023

Apache Spark is an expensive solution.

Miodrag Milojevic - PeerSpot reviewer
Real User
Top 5Leaderboard
2023-07-25T11:39:52Z
Jul 25, 2023

Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera. But in that case, you don't have any support. If you face a problem, you might find something in the community, but you cannot ask Cloudera about it. If you have open source, you don't have support, but you have a community. Cloudera has different packages, which are licensed versions of products like Apache Spark. In this case, you can ask Cloudera for everything.

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
765,386 professionals have used our research since 2012.
ML
Real User
2023-07-17T11:51:53Z
Jul 17, 2023

We are using the free version of the solution.

Armando Becerril - PeerSpot reviewer
Real User
Top 5
2023-02-13T20:14:00Z
Feb 13, 2023

Licensing costs depend on where you source the solution.

Ilya Afanasyev - PeerSpot reviewer
Real User
Top 5Leaderboard
2022-08-03T04:09:48Z
Aug 3, 2022

It's an open-source product. I don't know much about the licensing aspect.

Salvatore Campana - PeerSpot reviewer
Real User
Top 5
2022-04-27T08:19:19Z
Apr 27, 2022

Spark is an open-source solution, so there are no licensing costs.

AR
Consultant
2022-02-22T10:00:42Z
Feb 22, 2022

This is an open-source tool, so it can be used free of charge. There is no cost involved.

Suresh_Srinivasan - PeerSpot reviewer
Real User
2021-12-28T09:52:00Z
Dec 28, 2021

Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free.

Oscar Estorach - PeerSpot reviewer
Real User
Top 10
2021-08-18T14:51:07Z
Aug 18, 2021

We use the open-source version. It is free to use. However, you do need to have servers. We have three or four. they can be on-premises or in the cloud.

NK
Real User
Top 20
2021-02-01T12:04:16Z
Feb 1, 2021

Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera.

RV
Real User
2020-07-23T07:58:35Z
Jul 23, 2020

I'm unsure as to how much the licensing is for the solution. It's not an aspect of the product I deal with directly.

GK
Real User
2020-06-10T05:20:34Z
Jun 10, 2020

Apache spark is available in cloud services like AWS cloud, Azure. We have to use the specific service for our use case. For example we can use AWS Glue which runs spark for ETL process, AWS EMR /Azurre data brick for on demand data processing in the cloud. Basically it depends on how much capacity we will processing the data. It is recommended to get started with minimal configuration and stop the services when not in use.

KK
Real User
Top 20
2020-02-02T10:42:14Z
Feb 2, 2020

The initial setup is straightforward. It took us around one week to set it up, and then the requirements and creation of the project flow and design needed to be done. The design stage took three to four weeks, so in total, it required between four and five weeks to set up.

SA
Consultant
2019-12-23T07:05:00Z
Dec 23, 2019

I would suggest not to try to do everything at once. Identify the area where you want to solve the problem, start small and expand it incrementally, slowly expand your vision. For example, if I have a problem where I need to do streaming, just focus on the streaming and not on the machine learning that Spark offers. It offers a lot of things but you need to focus on one thing so that you can learn. That is what I have learned from the little experience I have with Spark. You need to focus on your objective and let the tools help you rather than the tools drive the work. That is my advice.

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function...
Download Apache Spark ReportRead more

Related articles