Try our new research platform with insights from 80,000+ expert users

AWS Lambda vs Apache Spark vs Google Cloud Dataflow comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Mindshare comparison

Hadoop
Compute Service
Streaming Analytics
 

Featured Reviews

Dunstan Matekenya - PeerSpot reviewer
Open-source solution for data processing with portability
Apache Spark is known for its ease of use. Compared to other available data processing frameworks, it is user-friendly. While many choices now exist, Spark remains easy to use, particularly with Python. You can utilize familiar programming styles similar to Pandas in Python, including object-oriented programming. Another advantage is its portability. I can prototype and perform some initial tasks on my laptop using Spark without needing to be on Databricks or any cloud platform. I can transfer it to Databricks or other platforms, such as AWS. This flexibility allows me to improve processing even on my laptop. For instance, if I'm processing large amounts of data and find my laptop becoming slow, I can quickly switch to Spark. It handles small and large datasets efficiently, making it a versatile tool for various data processing needs.
Andrew-Wong - PeerSpot reviewer
Convenience in deployment process with room for code preview improvement
Having a better preview would be helpful. Sometimes, if my Lambda code is too big, it can be inconvenient as I'm unable to see my code when it exceeds a certain size. AWS has a limit, like a three-megabyte limit, beyond which I cannot view or edit the code easily.
Jana Polianskaja - PeerSpot reviewer
Build Scalable Data Pipelines with Apache Beam and Google Cloud Dataflow
As a data engineer, I find several features of Google Cloud Dataflow particularly valuable. The ability to test solutions locally using Direct Runner is crucial for development, allowing me to validate pipelines without incurring the costs of full Dataflow jobs. The unified programming model for both batch and streaming processing is exceptional - requiring only minor code adjustments to optimize for either mode. This flexibility extends to language support, with robust implementations in both Java and Python, allowing teams to leverage their existing expertise. The platform's comprehensive monitoring capabilities are another standout feature. The intuitive interface, Grafana integration, and extensive service connectivity make troubleshooting and performance tracking highly efficient. Furthermore, seamless integration with Google Cloud Composer (managed Airflow) enables sophisticated orchestration of data pipelines.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The features we find most valuable are the machine learning, data learning, and Spark Analytics."
"Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark. Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more."
"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."
"We use Spark to process data from different data sources."
"The good performance. The nice graphical management console. The long list of ML algorithms."
"Spark is used for transformations from large volumes of data, and it is usefully distributed."
"The deployment of the product is easy."
"The product’s most valuable features are lazy evaluation and workload distribution."
"AWS Lambda is serverless."
"It makes configurations more convenient as changes can be made through the environmental variables without altering the main code."
"I have used AWS Lambda for simple messaging for SQS, creating a cron job, and delay messaging."
"It is easy to use."
"The solution is designed very well. You don't need to keep a server up. You just need some router to route your API request and Lambda provides a very well-designed feature to process the request."
"The fact that it is serverless is really important."
"They have the built-in IDE, so everything happens without integration issues."
"We use AWS Lambda because it provides a solution for our needs without requiring us to manage our infrastructure. With the tool, we only pay for the resources we use. Additionally, it is straightforward to implement and integrates with other services like API Gateway."
"It allows me to test solutions locally using runners like Direct Runner without having to start a Dataflow job, which can be costly."
"The service is relatively cheap compared to other batch-processing engines."
"The most valuable features of Google Cloud Dataflow are scalability and connectivity."
"I don't need a server running all the time while using the tool. It is also easy to setup. The product offers a pay-as-you-go service."
"The best feature of Google Cloud Dataflow is its practical connectedness."
"The most valuable features of Google Cloud Dataflow are the integration, it's very simple if you have the complete stack, which we are using. It is overall very easy to use, user-friendly friendly, and cost-effective if you know how to use it. The solution is very flexible for programmers, if you know how to do scripts or program in Python or any other language, it's extremely easy to use."
"The support team is good and it's easy to use."
"The integration within Google Cloud Platform is very good."
 

Cons

"Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing."
"Apart from the restrictions that come with its in-memory implementation. It has been improved significantly up to version 3.0, which is currently in use."
"There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance."
"Apache Spark lacks geospatial data."
"The solution’s integration with other platforms should be improved."
"Spark could be improved by adding support for other open-source storage layers than Delta Lake."
"Technical expertise from an engineer is required to deploy and run high-tech tools, like Informatica, on Apache Spark, making it an area where improvements are required to make the process easier for users."
"They could improve the issues related to programming language for the platform."
"Lambda has limitations on the amount of memory you can use and is not a good solution for long running processes."
"AWS Lambda should support additional languages."
"Having a better preview would be helpful."
"They should work on the solution's stability and pricing."
"It could be cheaper."
"AWS Lambda can improve its file system-based sharing capabilities and restrictions."
"What could be improved in AWS Lambda is a tricky question because I base the area for improvement on a specific matrix, for example, latency, so I'm still determining if I can be the judge on that. However, room for improvement could be when you're using AWS Lambda as a backend, it can be challenging to use it for monitoring. Monitoring is critical in development, and I don't have much expertise in the area, but you can use other services such as Xray. I found that monitoring on AWS Lambda is a challenge. The tool needs better monitoring. Another area for improvement in AWS Lambda is the cold start, where it takes some time to invoke a function the first time, but after that, invoking it becomes swift. Still, there's room for improvement in that AWS Lambda process. In the next release of AWS Lambda, I'd like AWS to improve monitoring so that I can monitor codes better."
"I wish to see better execution time in the next release."
"Occasionally, dealing with a huge volume of data causes failure due to array size."
"I would like to see improvements in consistency and flexibility for schema design for NoSQL data stored in wide columns."
"Google Cloud Data Flow can improve by having full simple integration with Kafka topics. It's not that complicated, but it could improve a bit. The UI is easy to use but the experience could be better. There are other tools available that do a better job."
"The technical support has slight room for improvement."
"The deployment time could also be reduced."
"I would like Google Cloud Dataflow to be integrated with IT data flow and other related services to make it easier to use as it is a complex tool."
"They should do a market survey and then make improvements."
"Promoting the technology more broadly would help increase its adoption."
 

Pricing and Cost Advice

"Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."
"They provide an open-source license for the on-premise version."
"It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project."
"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
"On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."
"It is an open-source platform. We do not pay for its subscription."
"I did not pay anything when using the tool on cloud services, but I had to pay on the compute side. The tool is not expensive compared with the benefits it offers. I rate the price as an eight out of ten."
"The product is expensive, considering the setup."
"It costs maybe less than $10 per month in my use case."
"The pricing is on-demand and based on runs or times that are billed out monthly."
"The price is expensive and is based on usage. The more users you have the higher the cost."
"AWS Lambda is cheap."
"Its pricing is on the higher side."
"It computes by the cycle, and it's very cheap."
"The price of AWS Lambda is priced very low."
"AWS Lambda cost is pretty decent."
"On a scale from one to ten, where one is cheap, and ten is expensive, I rate Google Cloud Dataflow's pricing a four out of ten."
"The solution is not very expensive."
"Google Cloud Dataflow is a cheap solution."
"On a scale from one to ten, where one is cheap, and ten is expensive, I rate the solution's pricing a seven to eight out of ten."
"The price of the solution depends on many factors, such as how they pay for tools in the company and its size."
"The solution is cost-effective."
"The tool is cheap."
"Google Cloud is slightly cheaper than AWS."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
855,347 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
7%
Comms Service Provider
6%
Educational Organization
59%
Financial Services Firm
9%
Computer Software Company
6%
Manufacturing Company
4%
Financial Services Firm
18%
Manufacturing Company
12%
Retailer
11%
Computer Software Company
10%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Apache Spark is open-source, so it doesn't incur any charges.
What needs improvement with Apache Spark?
There is complexity when it comes to understanding the whole ecosystem, especially for beginners. I find it quite com...
Which is better, AWS Lambda or Batch?
AWS Lambda is a serverless solution. It doesn’t require any infrastructure, which allows for cost savings. There is n...
What do you like most about AWS Lambda?
The tool scales automatically based on the number of incoming requests.
What is your experience regarding pricing and costs for AWS Lambda?
The pricing of AWS Lambda is reasonable. It's beneficial and cost-effective for users regardless of the number of ins...
What do you like most about Google Cloud Dataflow?
The product's installation process is easy...The tool's maintenance part is somewhat easy.
What is your experience regarding pricing and costs for Google Cloud Dataflow?
Pricing is normal. It is part of a package received from Google, and they are not charging us too high.
What needs improvement with Google Cloud Dataflow?
I am not sure, as we built only one job, and it is running on a daily basis. Everything else is managed using BigQuer...
 

Comparisons

 

Also Known As

No data available
No data available
Google Dataflow
 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Netflix
Absolutdata, Backflip Studios, Bluecore, Claritics, Crystalloids, Energyworx, GenieConnect, Leanplum, Nomanini, Redbus, Streak, TabTale
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: June 2025.
855,347 professionals have used our research since 2012.