Apache Spark vs Pentaho Business Analytics comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Apache Spark
Average Rating
8.4
Number of Reviews
60
Ranking in other categories
Hadoop (1st), Compute Service (5th), Java Frameworks (2nd)
Pentaho Business Analytics
Average Rating
8.0
Number of Reviews
42
Ranking in other categories
BI (Business Intelligence) Tools (19th), Cloud Operations Analytics (4th), Reporting (17th)
 

Mindshare comparison

As of June 2024, in the Hadoop category, the mindshare of Apache Spark is 14.3%, down from 23.4% compared to the previous year. The mindshare of Pentaho Business Analytics is 1.3%, up from 0.6% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Hadoop
Unique Categories:
Compute Service
11.0%
Java Frameworks
6.5%
BI (Business Intelligence) Tools
0.6%
Cloud Operations Analytics
50.0%
 

Featured Reviews

JK
Jul 6, 2023
Seamless in distributing tasks, including its impressive map-reduce functionality
Predominantly, I use Spark for data analysis on top of datasets containing tens of millions of records I have an example. We had a single-threaded application that used to run for about four to five hours, but with Spark, it got reduced to under one hour. The distribution of tasks, like the…
Azhagarasan Annadorai - PeerSpot reviewer
Nov 27, 2023
Great automation, easy migration, and a rather simple setup
The fuzzy logic match component came in handy to identify AML suspects (anti-money laundering). The SHA key generation component helped in generating a unique identifier for our customers, without revealing personal data; especially when we send data to the AWS cloud. Regularly, we had to send the new policy data files to medical centers over email. Using Pentaho, we were able to compress the files in zip format with a password and email them to unlimited partners. We could confidently use Pentaho for sharing data with internal and external customers in mediums such as FTP servers, LAN folders (Local Area Network), email, etc. Connectivity to platforms such as snowflake is great.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The scalability has been the most valuable aspect of the solution."
"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."
"The main feature that we find valuable is that it is very fast."
"The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations."
"The distribution of tasks, like the seamless map-reduce functionality, is quite impressive."
"The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it."
"The solution is very stable."
"The solution is scalable."
"The most valuable feature of Pentaho is the Tableau report."
"Easy to use components to create the job."
"The initial setup is pretty straightforward."
"Pentaho is an analytics platform that can be used when an organization has a lot of big data storage systems already installed and needs to manage and analyze that data. It has a specific use case for unstructured data, such as documents, and needs to be able to search and analyze it."
"I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced."
"Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
"We were able to install it without any assistance from tech support."
 

Cons

"One limitation is that not all machine learning libraries and models support it."
"Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing."
"Technical expertise from an engineer is required to deploy and run high-tech tools, like Informatica, on Apache Spark, making it an area where improvements are required to make the process easier for users."
"Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn."
"Apart from the restrictions that come with its in-memory implementation. It has been improved significantly up to version 3.0, which is currently in use."
"Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases."
"Apache Spark should add some resource management improvements to the algorithms."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"We did not achieve the ROI. The work delivered to users had lesser value than the subscription cost."
"Another concern is that Pentaho is not customizable or interactive."
"Version control would be a good addition."
"Logging capability is needed."
"Pentaho Business Analytics' user interface is outdated."
"Deployment is not simple. It is not simple because we are dealing with a lot of data; we are dealing with a lot of storage. So, it's not a simple process."
"The repository should be improved."
"Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."
 

Pricing and Cost Advice

"On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."
"Apache Spark is an open-source tool."
"Apache Spark is an expensive solution."
"The tool is an open-source product. If you're using the open-source Apache Spark, no fees are involved at any time. Charges only come into play when using it with other services like Databricks."
"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
"Considering the product version used in my company, I feel that the tool is not costly since the product is available for free."
"It is an open-source platform. We do not pay for its subscription."
"The solution is affordable and there are no additional licensing costs."
"We were lucky enough to find a Pentaho OEM partner who offered a data warehouse model and the ETL software for about 60K SGD per year."
"Pentaho is expensive ."
"Free and commercial versions are available."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
789,135 professionals have used our research since 2012.
 

Comparison Review

it_user6978 - PeerSpot reviewer
Jun 10, 2013
Jaspersoft vs. Pentaho – Which one to use & is there any need to purchase the commercial edition
Any company (be it technology, manfucaturing, human resource, ecommerce, SME etc) always has the need for Business Intelligence to some or the other extent. If cost is one of the consideration factor, then the 2 BI tools which are at the forefront are Pentaho and Jaspersoft. But, often the same…
 

Top Industries

By visitors reading reviews
Financial Services Firm
25%
Computer Software Company
13%
Manufacturing Company
7%
Retailer
5%
Financial Services Firm
23%
Computer Software Company
12%
Government
10%
Educational Organization
9%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What needs improvement with Apache Spark?
In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, and do the transformation in a subsecond
Seeking lightweight open source BI software
There are many...It would rather depend what System BI architecture or Enterprise legacy you have at your end...I would recommend as follows: 1) If you have legacies of SAP, Oracle - look for SAP...
What is your experience regarding pricing and costs for Pentaho Business Analytics?
The organization has both options based on their needs and budget constraints. The Enterprise Edition is expensive with references to an added number of features.
What needs improvement with Pentaho Business Analytics?
The product to me is not as user-friendly as other players in the market. It also still needs improvement in the reporting module. You will need to search for deployment examples or need to have a ...
 

Also Known As

No data available
Pentaho, Kettle, Hitachi Pentaho Business Analytics
 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Cargo 2000 Lufthansa, Marketo, ModCloth, Cardiac Science, Telefonica, ExactTarget, Active Broadband Networks, and Brussels Airport.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: May 2024.
789,135 professionals have used our research since 2012.