Coming October 25: PeerSpot Awards will be announced! Learn more
Buyer's Guide
Java Frameworks
August 2022
Get our free report covering Eclipse Foundation, Apache, Eclipse Foundation, and other competitors of Spring Boot. Updated: August 2022.
632,779 professionals have used our research since 2012.

Read reviews of Spring Boot alternatives and competitors

Ilya Afanasyev - PeerSpot reviewer
Senior Software Development Engineer at Yahoo!
Real User
Top 5Leaderboard
Reliable, able to expand, and handle large amounts of data well
Pros and Cons
  • "There's a lot of functionality."
  • "I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it."

What is our primary use case?

It's a root product that we use in our pipeline.

We have some input data. For example, we have one system that supplies some data to MongoDB, for example, and we pull this data from MongoDB, enrich this data from other systems - with some additional fields - and write to S3 for other systems. Since we have a lot of data, we need a parallel process that runs hourly.

What is most valuable?

We use batch processing. It works well with our formats and file versions. There's a lot of functionality. 

In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000.

The solution is scalable.

It's a stable product.

What needs improvement?

The primary language for developers on Spark is Scala. Now it's also about Java. I prefer Java versus Scala, and since they are supported, it is good. I know there is always discussion about which language to write applications in, and some people do love Scala. However, I don't like it.

They use currently have a JDK version which is a little bit old. Not all features are on it. Maybe they should pull support of the JDK version.

For how long have I used the solution?

I've used the solution for a year and a half. 

What do I think about the stability of the solution?

The solution is stable. There are no bugs or glitches. It doesn't crash or freeze. 

What do I think about the scalability of the solution?

The product scales well. It's fine to expand if needed. 

Many teams use Spark. For example, we have a few kinds of pipelines, huge pipelines. One of them processes 300 billion events each day. It's our core technology currently.

We do not plan to increase usage. We keep our legacy system on Spark, and we are now discussing Flink and Spark and what we would prefer. However, most of the people are already migrating new systems to Flink. We will keep Spark for a few more years still. 

How are customer service and support?

We have an internal team, and they participate in process of developing Spark. They are Spark contributors, and if we have some problems, we turn to them. It's our own people, yet they work with Spark. Generally, if the problem is more minor, we look at some sites or have some discussion about Spark or internal guys who have experience with Spark. 

Which solution did I use previously and why did I switch?

We also use Flink.

Before Spark, I worked with another company that we used some different technology, including Kafka, Radius, Postgres SQL, S3, and Spring. 

How was the initial setup?

I didn't handle the initial setup. We were using this pipeline and clusters already. I just installed it on my local server. However, in terms of difficulty, I didn't see any problem. The deployment might only take a few hours. 

I found some documentation. I got the documentation from the site and downloaded the archive and unzipped it, and installed it. I can't say that I installed something from a special configuration. I just installed a few nodes for debugging and for running locally, and that's all. Also, in one case I used, for example, a Docker configuration with Spark. It all worked fine.

What's my experience with pricing, setup cost, and licensing?

It's an open-source product. I don't know much about the licensing aspect. 

Which other solutions did I evaluate?

We have compared Flink and Spark as two possible options. 

What other advice do I have?

I can recommend the product. It's a nice system for batch processing huge data.

I'd rate the solution eight out of ten. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Buyer's Guide
Java Frameworks
August 2022
Get our free report covering Eclipse Foundation, Apache, Eclipse Foundation, and other competitors of Spring Boot. Updated: August 2022.
632,779 professionals have used our research since 2012.