2018-06-27T19:19:00Z

What is your primary use case for Apache Spark?

Miriam Tover - PeerSpot reviewer
  • 1
  • 23
PeerSpot user
30

30 Answers

Suriya Senthilkumar - PeerSpot reviewer
Real User
2024-02-26T16:01:50Z
Feb 26, 2024

We use the product in our environment for data processing and performing Data Definition Language (DDL) operations.

Search for a product comparison
Hamid M. Hamid - PeerSpot reviewer
Real User
Top 5Leaderboard
2024-02-05T09:17:45Z
Feb 5, 2024

In my company, the solution is used for batch processing or real-time processing.

SS
Real User
Top 5Leaderboard
2023-12-06T10:45:56Z
Dec 6, 2023

We use it for real-time and near-real-time data processing. We use it for ETL purposes as well as for implementing the full transformation pipelines.

NB
MSP
Top 5
2023-11-10T13:04:33Z
Nov 10, 2023

In AI deployment, a key step is aggregating data from various sources, such as customer websites, debt records, and asset information. Apache Spark plays a vital role in this process, efficiently handling continuous streams of data. Its capability enables seamless gathering and feeding of diverse data into the system, facilitating effective processing and analysis for generating alerts and insights, particularly in scenarios like banking.

Jagannadha Rao - PeerSpot reviewer
Real User
Top 10
2023-10-20T07:41:27Z
Oct 20, 2023

We use Apache Spark for storage and processing.

FK
Real User
Top 5Leaderboard
2023-07-26T09:09:50Z
Jul 26, 2023

Our customers configure their software applications, and I use Apache to check them. We use it for data processing.

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
765,386 professionals have used our research since 2012.
JK
Real User
Top 20
2023-07-06T10:55:23Z
Jul 6, 2023

Predominantly, I use Spark for data analysis on top of datasets containing tens of millions of records.

Armando Becerril - PeerSpot reviewer
Real User
Top 5
2023-02-13T20:14:00Z
Feb 13, 2023

We use Spark for machine learning applications, clustering, and segmentation of customers.

Ilya Afanasyev - PeerSpot reviewer
Real User
Top 5Leaderboard
2022-08-03T04:09:48Z
Aug 3, 2022

It's a root product that we use in our pipeline. We have some input data. For example, we have one system that supplies some data to MongoDB, for example, and we pull this data from MongoDB, enrich this data from other systems - with some additional fields - and write to S3 for other systems. Since we have a lot of data, we need a parallel process that runs hourly.

SK
Real User
Top 20
2022-07-04T15:18:53Z
Jul 4, 2022

I am using Apache Spark for the data transition from databases. We have customers who have one database as a data lake.

AmitMataghare - PeerSpot reviewer
Real User
Top 5
2022-04-27T08:19:23Z
Apr 27, 2022

Apache Spark is a programming language similar to Java or Python. In my most recent deployment, we used Apache Spark to build engineering pipelines to move data from sources into the data lake.

Salvatore Campana - PeerSpot reviewer
Real User
Top 5
2022-04-27T08:19:19Z
Apr 27, 2022

I use Spark to run automation processes driven by data.

GK
Real User
2020-06-10T05:14:07Z
Jun 10, 2020

Apache Spark can be used in multiple use case in big data and in data engineering task. We are using Apache spark for ETL, integration with streaming data and performing real time prediction like anomaly, price prediction and data exploration on large volume of data.

Onur Tokat - PeerSpot reviewer
Consultant
2022-02-15T16:44:00Z
Feb 15, 2022

I mainly use Spark to prepare data for processing because it has APIs for data evaluation.

Suresh_Srinivasan - PeerSpot reviewer
Real User
2021-12-28T09:52:00Z
Dec 28, 2021

The solution can be deployed on the cloud or on-premise.

Oscar Estorach - PeerSpot reviewer
Real User
Top 10
2021-08-18T14:51:07Z
Aug 18, 2021

You can do a lot of things in terms of the transformation of data. You can store and transform and stream data. It's very useful and has many use cases.

GA
Real User
2021-03-27T15:39:24Z
Mar 27, 2021

We use Apache Spark to prepare data for transformation and encryption, depending on the columns. We use AES-256 encryption. We're building a proof of concept at the moment. We prepare patches on Spark for Kubernetes on-premise and Google Cloud Platform.

NK
Real User
Top 20
2021-02-01T12:04:16Z
Feb 1, 2021

I use it mostly for ETL transformations and data processing. I have used Spark on-premises as well as on the cloud.

KK
Real User
2020-10-28T02:27:29Z
Oct 28, 2020

We just finished a central front project called MFY for our in-house fraud team. In this project, we are using Spark along with Cloudera. In front of Spark, we are using Couchbase. Spark is mainly used for aggregations and AI (for future usage). It gathers stuff from Couchbase and does the calculations. We are not actively using Spark AI libraries at this time, but we are going to use them. This project is for classifying the transactions and finding suspicious activities, especially those suspicious activities that come from internet channels such as internet banking and mobile banking. It tries to find out suspicious activities and executes rules that are being developed or written by our business team. An example of a rule is that if the transaction count or transaction amount is greater than 10 million Turkish Liras and the user device is new, then raise an exception. The system sends an SMS to the user, and the user can choose to continue or not continue with the transaction.

RV
Real User
2020-07-23T07:58:35Z
Jul 23, 2020

When we receive data from the messaging queue, we process everything using Apache Spark. Data Bricks does the processing and sends back everything the Apache file in the data lake. The machine learning program does some kind of analysis using the ML prediction algorithm.

KK
Real User
Top 20
2020-02-02T10:42:14Z
Feb 2, 2020

Our use case for Apache Spark was a retail price prediction project. We were using retail pricing data to build predictive models. To start, the prices were analyzed and we created the dataset to be visualized using Tableau. We then used a visualization tool to create dashboards and graphical reports to showcase the predictive modeling data. Apache Spark was used to host this entire project.

Suresh_Srinivasan - PeerSpot reviewer
Real User
2020-01-29T11:22:00Z
Jan 29, 2020

We have built a product called "NetBot." We take any form of data, large email data, image, videos or transactional data and we transform unstructured textual data videos in their structured form into reading into transactional data and we create an enterprise-wide smart data grid. That smart data grid is being used by the downstream analytics tool. We also provide machine-building for people to get faster insight into their data.

SA
Consultant
2019-12-23T07:05:00Z
Dec 23, 2019

We are working with a client that has a wide variety of data residing in other structured databases, as well. The idea is to make a database in Hadoop first, which we are in the process of building right now. One place for all kinds of data. Then we are going to use Spark.

MG
Real User
2019-12-09T10:58:00Z
Dec 9, 2019

We primarily use the solution to integrate very large data sets from another environment, such as our SQL environment, and draw purposeful data before checking it. We also use the solution for streaming very very large servers.

AD
Consultant
2019-10-13T05:48:00Z
Oct 13, 2019

We use this solution for information gathering and processing. I use it myself when I am developing on my laptop. I am currently using an on-premises deployment model. However, in a few weeks, I will be using the EMR version on the cloud.

LC
Real User
2019-07-14T10:21:00Z
Jul 14, 2019

We primarily use the solution for security analytics.

it_user946074 - PeerSpot reviewer
Real User
2019-07-10T12:01:00Z
Jul 10, 2019

We use the solution for analytics.

it_user1059558 - PeerSpot reviewer
Real User
2019-04-08T13:04:00Z
Apr 8, 2019

Streaming telematics data.

SP
Real User
2019-03-17T03:12:00Z
Mar 17, 2019

Ingesting billions of rows of data all day.

reviewer894894 - PeerSpot reviewer
User
2018-06-27T19:19:00Z
Jun 27, 2018

Used for building big data platforms for processing huge volumes of data. Additionally, streaming data is critical.

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function...
Download Apache Spark ReportRead more

Related articles