What is your primary use case for Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function...

Download Apache Spark Report Read more

Related Q&As

Aug 28, 2023

Which solution has better performance: Spring Boot or Apache Spark?

Apr 19, 2020

Which is the best RDMBS solution for big data?

Omar Khaled Data Engineer at a tech company with 10,001+ employees · Answer 1 · 2025-08-11T14:34:31Z

I don't use a big data solution such as a Data Lake. I have used Apache Spark code on a lot of big data. We don't have a Data Lake and these new technologies, but we have used Apache Spark code to get data from big data, CRM, and Siebel. Siebel is the application that exists in Teleco branches in Egypt, and the customers communicate with it. We have a lot of data from the finance team, CRM, Siebel, and other sources. We consolidate all of this data and perform transformations. With Apache Spark, we can perform various transformations. For instance, when a customer calls their mother and consumes many minutes, we should consolidate all of them and calculate the net minutes. For the monthly invoice, we should determine how much the customer should pay for ADSL, phones, and family phones. We take all of this information, transform it, and use it to generate the invoice. We enhance the data processing by using Apache Spark SQL. I haven't used Apache Spark machine learning, but I've used Apache Spark SQL because we have data in HDFS tables. We take the Apache Spark code, get the data, and then get the aggregate. For example, we can request aggregated data about a customer's consumption since last week. You can use Apache Spark code or Python code itself, and if you know SQL, you can type SQL code within the same script and output it to any table or Excel file. It's durable and easy. When a customer has an issue with their phone number and can't call or access the internet, they visit a branch and speak with an agent. We need to take action based on the data, so we need real-time data processing to get aggregated data for this customer from the last week or month. All data solutions serve the customer and business needs, which is what I appreciate about data solutions.

KamleshPant Senior Software Architect at USEReady · Answer 2 · 2025-04-24T06:07:45Z

I use Apache Spark ( /products/apache-spark-reviews ) for any data engineering part. I handle some computation processes where it is necessary to process big data.

Bharghava Raghavendra Beesa Senior Developer at Infosys · Answer 3 · 2025-01-21T11:56:14Z

I have some hands-on experience with Spark. I have one year of experience that should be considered as one year working with Spark, which is six months to one year. We use it for faster processing, especially compute. Spark is used for transformations from large volumes of data, and it is usefully distributed. We receive data from various sources and need to transform it. The data is enormous, in terabytes, and often from specific databases. We perform transformations, aggregations, and deduplication. We meet business requirements by computing data, minimizing it, aggregating it, or performing other operations. We typically write to Hive downstream.

Aleksandr Motuzov Head of Data Science center of excellence at Ameriabank CJSC · Answer 4 · 2024-09-23T07:34:00Z

The primary use case for Apache Spark is to process data in memory, using big data, and distributing the engine to process said data. It is used for various tasks such as running the association rules algorithm in ML Spark ML, running XGBoost in parallel using the Spark engine, and preparing data for online machine learning using Spark Streaming mode.

score 0 · Answer 5 · 2024-07-10T15:58:00Z

Most of my use cases involve data processing. For example, someone tried to run sentiment analysis on Databricks using Apache Spark. They had to handle data from many countries and languages, which presented some challenges. Besides that, I primarily use Apache Spark for data processing tasks. I work with mobile phone datasets, around one terabyte in size. This involves extracting and analyzing data before building any models.

Suriya Senthilkumar Analyst at Deloitte · Answer 6 · 2024-02-26T16:01:50Z

We use the product in our environment for data processing and performing Data Definition Language (DDL) operations.

Hamid M. Hamid Data architect at Banking Sector · Answer 7 · 2024-02-05T09:17:45Z

Hamid M. Hamid

Data architect at Banking Sector

Real User

Top 5Leaderboard

Feb 5, 2024

In my company, the solution is used for batch processing or real-time processing.

Suresh_Srinivasan Co-Founder at FORMCEPT Technologies · Answer 8 · 2024-01-31T11:05:00Z

Suresh_Srinivasan

Co-Founder at FORMCEPT Technologies

Real User

Top 10

Jan 31, 2024

Our primary use case is for interactively processing large volume of data.

Sachin Shukre Sr Manager at a transportation company with 10,001+ employees · Answer 9 · 2023-12-06T10:45:56Z

We use it for real-time and near-real-time data processing. We use it for ETL purposes as well as for implementing the full transformation pipelines.

score 0 · Answer 10 · 2023-11-10T13:04:33Z

In AI deployment, a key step is aggregating data from various sources, such as customer websites, debt records, and asset information. Apache Spark plays a vital role in this process, efficiently handling continuous streams of data. Its capability enables seamless gathering and feeding of diverse data into the system, facilitating effective processing and analysis for generating alerts and insights, particularly in scenarios like banking.

Jagannadha Rao Lead Data Scientist at International School of Engineering · Answer 11 · 2023-10-20T07:41:27Z

Jagannadha Rao

Lead Data Scientist at International School of Engineering

Real User

Top 5

Oct 20, 2023

We use Apache Spark for storage and processing.

Gopi Krishnan Works at Ideas2IT Technologies · Answer 12 · 2020-06-10T05:14:07Z

Apache Spark can be used in multiple use case in big data and in data engineering task. We are using Apache spark for ETL, integration with streaming data and performing real time prediction like anomaly, price prediction and data exploration on large volume of data.

Farzam Khodaei Data Engineer at Berief Food GmbH · Answer 13 · 2023-07-26T09:09:50Z

Our customers configure their software applications, and I use Apache to check them. We use it for data processing.

score 0 · Answer 14 · 2023-07-06T10:55:23Z

Predominantly, I use Spark for data analysis on top of datasets containing tens of millions of records.

score 0 · Answer 15 · 2023-02-13T20:14:00Z

Armando Becerril

Partner / Head of Data & Analytics at Intelligence Software Consulting

Real User

Top 10

Feb 13, 2023

We use Spark for machine learning applications, clustering, and segmentation of customers.

Ilya Afanasyev Senior Software Development Engineer at Yahoo! · Answer 16 · 2022-08-03T04:09:48Z

It's a root product that we use in our pipeline. We have some input data. For example, we have one system that supplies some data to MongoDB, for example, and we pull this data from MongoDB, enrich this data from other systems - with some additional fields - and write to S3 for other systems. Since we have a lot of data, we need a parallel process that runs hourly.

score 0 · Answer 17 · 2022-07-04T15:18:53Z

I am using Apache Spark for the data transition from databases. We have customers who have one database as a data lake.

AmitMataghare Associate Director at PricewaterhouseCoopers · Answer 18 · 2022-04-27T08:19:23Z

Apache Spark is a programming language similar to Java or Python. In my most recent deployment, we used Apache Spark to build engineering pipelines to move data from sources into the data lake.

Salvatore Campana CEO & Founder at Xautomata · Answer 19 · 2022-04-27T08:19:19Z

Salvatore Campana

CEO & Founder at Xautomata

Real User

Top 5

Apr 27, 2022

I use Spark to run automation processes driven by data.

Onur Tokat Big Data Engineer Consultant at Collective[i] · Answer 20 · 2022-02-15T16:44:00Z

I mainly use Spark to prepare data for processing because it has APIs for data evaluation.

Suresh_Srinivasan Co-Founder at FORMCEPT Technologies · Answer 21 · 2021-12-28T09:52:00Z

Suresh_Srinivasan

Co-Founder at FORMCEPT Technologies

Real User

Top 10

Dec 28, 2021

The solution can be deployed on the cloud or on-premise.

Oscar Estorach Chief Data-strategist and Director at Theworkshop.es · Answer 22 · 2021-08-18T14:51:07Z

You can do a lot of things in terms of the transformation of data. You can store and transform and stream data. It's very useful and has many use cases.

reviewer1535340 Senior Solutions Architect at a retailer with 10,001+ employees · Answer 23 · 2021-03-27T15:39:24Z

We use Apache Spark to prepare data for transformation and encryption, depending on the columns. We use AES-256 encryption. We're building a proof of concept at the moment. We prepare patches on Spark for Kubernetes on-premise and Google Cloud Platform.

NitinKumar Director of Enginnering at Sigmoid · Answer 24 · 2021-02-01T12:04:16Z

I use it mostly for ETL transformations and data processing. I have used Spark on-premises as well as on the cloud.

Kürşat Kurt Software Architect at Akbank · Answer 25 · 2020-10-28T02:27:29Z

We just finished a central front project called MFY for our in-house fraud team. In this project, we are using Spark along with Cloudera. In front of Spark, we are using Couchbase. Spark is mainly used for aggregations and AI (for future usage). It gathers stuff from Couchbase and does the calculations. We are not actively using Spark AI libraries at this time, but we are going to use them. This project is for classifying the transactions and finding suspicious activities, especially those suspicious activities that come from internet channels such as internet banking and mobile banking. It tries to find out suspicious activities and executes rules that are being developed or written by our business team. An example of a rule is that if the transaction count or transaction amount is greater than 10 million Turkish Liras and the user device is new, then raise an exception. The system sends an SMS to the user, and the user can choose to continue or not continue with the transaction.

Rajendran Veerappan Director at Nihil Solutions · Answer 26 · 2020-07-23T07:58:35Z

When we receive data from the messaging queue, we process everything using Apache Spark. Data Bricks does the processing and sends back everything the Apache file in the data lake. The machine learning program does some kind of analysis using the ML prediction algorithm.

score 0 · Answer 27 · 2020-02-02T10:42:14Z

Our use case for Apache Spark was a retail price prediction project. We were using retail pricing data to build predictive models. To start, the prices were analyzed and we created the dataset to be visualized using Tableau. We then used a visualization tool to create dashboards and graphical reports to showcase the predictive modeling data. Apache Spark was used to host this entire project.

Suresh_Srinivasan Co-Founder at FORMCEPT Technologies · Answer 28 · 2020-01-29T11:22:00Z

We have built a product called "NetBot." We take any form of data, large email data, image, videos or transactional data and we transform unstructured textual data videos in their structured form into reading into transactional data and we create an enterprise-wide smart data grid. That smart data grid is being used by the downstream analytics tool. We also provide machine-building for people to get faster insight into their data.

score 0 · Answer 29 · 2019-12-23T07:05:00Z

We are working with a client that has a wide variety of data residing in other structured databases, as well. The idea is to make a database in Hadoop first, which we are in the process of building right now. One place for all kinds of data. Then we are going to use Spark.

Mohamed Ghorbel Director of BigData Offer at IVIDATA · Answer 30 · 2019-12-09T10:58:00Z

We primarily use the solution to integrate very large data sets from another environment, such as our SQL environment, and draw purposeful data before checking it. We also use the solution for streaming very very large servers.

score 0 · Answer 31 · 2019-10-13T05:48:00Z

We use this solution for information gathering and processing. I use it myself when I am developing on my laptop. I am currently using an on-premises deployment model. However, in a few weeks, I will be using the EMR version on the cloud.

Snrsecengin567 Snr Security Engineer at Securonix Solutions · Answer 32 · 2019-07-14T10:21:00Z

LC

Snrsecengin567

Snr Security Engineer at Securonix Solutions

Real User

Jul 14, 2019

We primarily use the solution for security analytics.

score 0 · Answer 33 · 2019-07-10T12:01:00Z

it_user946074

Principal Architect at a financial services firm with 1,001-5,000 employees

Real User

Jul 10, 2019

We use the solution for analytics.

it_user1059558 Portfolio Manager, Enterprise Solutions Architect at Capgemini · Answer 34 · 2019-04-08T13:04:00Z

it_user1059558

Portfolio Manager, Enterprise Solutions Architect at Capgemini

Real User

Apr 8, 2019

Streaming telematics data.

score 0 · Answer 35 · 2019-03-17T03:12:00Z

SP

Sumanth Punyamurthula

Director - Data Management, Governance and Quality at Hilton Worldwide

Real User

Mar 17, 2019

Ingesting billions of rows of data all day.

score 0 · Answer 36 · 2018-06-27T19:19:00Z

Used for building big data platforms for processing huge volumes of data. Additionally, streaming data is critical.