Share your experience using Thorntail

The easiest route - we'll conduct a 15 minute phone interview and write up the review for you.

Use our online form to submit your review. It's quick and you can post anonymously.

Your review helps others learn about this solution
The PeerSpot community is built upon trust and sharing with peers.
It's good for your career
In today's digital world, your review shows you have valuable expertise.
You can influence the market
Vendors read their reviews and make improvements based on your feedback.
Examples of the 83,000+ reviews on PeerSpot:

UjjwalGupta - PeerSpot reviewer
Module Lead at Mphasis
Real User
Top 10
Helps to build ETL pipelines load data to warehouses
Pros and Cons
  • "The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."
  • "Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial."

What is our primary use case?

We're using Apache Spark primarily to build ETL pipelines. This involves transforming data and loading it into our data warehouse. Additionally, we're working with Delta Lake file formats to manage the contents.

What is most valuable?

The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily.

What needs improvement?

Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial.

For how long have I used the solution?

I have been using the product for six years. 

What do I think about the stability of the solution?

Apache Spark is generally considered a stable product, with rare instances of breaking down. Issues may arise in sudden increases in data volume, leading to memory errors, but these can typically be managed with autoscaling clusters. Additionally, schema changes or irregularities in streaming data may pose challenges, but these could be addressed in future software versions.

What do I think about the scalability of the solution?

About 70-80 percent of employees in my company use the product. 

How are customer service and support?

We haven't contacted Apache Spark support directly because it's an open-source tool. However, when using it as a product within Databricks, we've contacted Databricks support for assistance.

Which solution did I use previously and why did I switch?

The main reason our company opted for the product is its capability to process large volumes of data. While other options like Snowflake offer some advantages, they may have limitations regarding custom logic or modifications.

How was the initial setup?

The solution's setup and installation of Apache Spark can vary in complexity depending on whether it's done in a standalone or cluster environment. The process is generally more straightforward in a standalone setup, especially if you're familiar with the concepts involved. However, setting up in a cluster environment may require more knowledge about clusters and networking, making it potentially more complex.

What's my experience with pricing, setup cost, and licensing?

The tool is an open-source product. If you're using the open-source Apache Spark, no fees are involved at any time. Charges only come into play when using it with other services like Databricks.

What other advice do I have?

If you're new to Apache Spark, the best way to learn is by using the Databricks Community Edition. It provides a cluster for Apache Spark where you can learn and test. I rate the product an eight out of ten.

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Suriya Senthilkumar - PeerSpot reviewer
Analyst at Deloitte
Real User
Processes a larger volume of data efficiently and integrates with different platforms
Pros and Cons
  • "The product’s most valuable features are lazy evaluation and workload distribution."
  • "They could improve the issues related to programming language for the platform."

What is our primary use case?

We use the product in our environment for data processing and performing Data Definition Language (DDL) operations.

What is most valuable?

The product’s most valuable features are lazy evaluation and workload distribution.

What needs improvement?

They could improve the issues related to programming language for the platform. 

For how long have I used the solution?

We have been using Apache Spark for around two and a half years.

What do I think about the stability of the solution?

The platform’s stability depends on how effectively we write the code. We encountered a few issues related to programming languages.

What do I think about the scalability of the solution?

We have more than 100 Apache Spark users in our organization.

Which solution did I use previously and why did I switch?

Before choosing Apache Spark for processing big data, we evaluated another option, Hadoop. However, Spark emerged as a superior choice comparatively.

How was the initial setup?

The initial setup complexity depends on whether it's on the cloud or on-premise. For cloud deployments, especially using platforms like Databricks, the process is straightforward and can be configured with ease. However, if the deployment is on-premise, the setup tends to be more time-consuming, although not overly complex.

What's my experience with pricing, setup cost, and licensing?

They provide an open-source license for the on-premise version. However, we have to pay for the cloud version including data centers and virtual machines.

What other advice do I have?

Apache Spark is a good product for processing large volumes of data compared to other distributed systems. It provides efficient integration with Hadoop and other platforms.

I rate it a ten out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate