Share your experience using Informatica Big Data Parser

The easiest route - we'll conduct a 15 minute phone interview and write up the review for you.

Use our online form to submit your review. It's quick and you can post anonymously.

Your review helps others learn about this solution
The PeerSpot community is built upon trust and sharing with peers.
It's good for your career
In today's digital world, your review shows you have valuable expertise.
You can influence the market
Vendors read their reviews and make improvements based on your feedback.
Examples of the 83,000+ reviews on PeerSpot:

UjjwalGupta - PeerSpot reviewer
Module Lead at Mphasis
Real User
Top 10
Helps to build ETL pipelines load data to warehouses
Pros and Cons
  • "The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."
  • "Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial."

What is our primary use case?

We're using Apache Spark primarily to build ETL pipelines. This involves transforming data and loading it into our data warehouse. Additionally, we're working with Delta Lake file formats to manage the contents.

What is most valuable?

The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily.

What needs improvement?

Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial.

For how long have I used the solution?

I have been using the product for six years. 

What do I think about the stability of the solution?

Apache Spark is generally considered a stable product, with rare instances of breaking down. Issues may arise in sudden increases in data volume, leading to memory errors, but these can typically be managed with autoscaling clusters. Additionally, schema changes or irregularities in streaming data may pose challenges, but these could be addressed in future software versions.

What do I think about the scalability of the solution?

About 70-80 percent of employees in my company use the product. 

How are customer service and support?

We haven't contacted Apache Spark support directly because it's an open-source tool. However, when using it as a product within Databricks, we've contacted Databricks support for assistance.

Which solution did I use previously and why did I switch?

The main reason our company opted for the product is its capability to process large volumes of data. While other options like Snowflake offer some advantages, they may have limitations regarding custom logic or modifications.

How was the initial setup?

The solution's setup and installation of Apache Spark can vary in complexity depending on whether it's done in a standalone or cluster environment. The process is generally more straightforward in a standalone setup, especially if you're familiar with the concepts involved. However, setting up in a cluster environment may require more knowledge about clusters and networking, making it potentially more complex.

What's my experience with pricing, setup cost, and licensing?

The tool is an open-source product. If you're using the open-source Apache Spark, no fees are involved at any time. Charges only come into play when using it with other services like Databricks.

What other advice do I have?

If you're new to Apache Spark, the best way to learn is by using the Databricks Community Edition. It provides a cluster for Apache Spark where you can learn and test. I rate the product an eight out of ten.

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Senior IT Application Architect at a insurance company with 5,001-10,000 employees
Real User
Completely secure and supports many tools, but the competitors have better functionalities
Pros and Cons
  • "The product is completely secure."
  • "The competitors provide better functionalities."

What is our primary use case?

We use the product for computing. We mainly use Spark, Hive, HDFS, and Impala.

How has it helped my organization?

The product has been instrumental for all computing needs. We have a data warehouse and a data lake. We read from S3 and load it into different databases. We compute all the transformations, logic, and code we write in PySpark or Spark Scala. Spark is very valuable for data processing.

What is most valuable?

The product is completely secure. It meets our protection needs. We have a dedicated on-premise cluster. Every year, the vendor introduces new versions and supports many tools that are available. They have different hosts. They have a private cloud and a public cloud base.

What needs improvement?

The competitors provide better functionalities.

For how long have I used the solution?

I have been using the solution for six years.

What do I think about the stability of the solution?

The tool’s stability is good. I rate the stability an eight or nine out of ten.

What do I think about the scalability of the solution?

We have 2000 to 3000 users in our organization.

How was the initial setup?

I rate the ease of setup a seven out of ten. The deployment takes 48 hours. We need six Hadoop administrators for the deployment.

What's my experience with pricing, setup cost, and licensing?

The tool is not expensive. However, it has a cost to it. I rate the pricing a seven out of ten.

Which other solutions did I evaluate?

Databricks has a Runtime version. It works well with the cloud.

What other advice do I have?

We have an analytics data mart. It is built on top of SQL Server. We use Spark for computing. We use SSIS and SSRS for SQL Server. There is a path set for analytics to migrate to Azure. I will recommend the solution to others. Cloudera is the best option if we need an on-premise implementation of Hadoop. If an organization wants to choose a cloud version, then Databricks might be a good option. Overall, I rate the solution a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate