We performed a comparison between Hitachi Lumada Data Integration and IBM InfoSphere DataStage based on real PeerSpot user reviews.Find out in this report how the two Data Integration Tools solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved."
"In StreamSets, everything is in one place."
"It is a very powerful, modern data analytics solution, in which you can integrate a large volume of data from different sources. It integrates all of the data and you can design, create, and monitor pipelines according to your requirements. It is an all-in-one day data ops solution."
"I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."
"It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there."
"The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data."
"I absolutely love Hitachi. I'm one of the forefront supporters of Hitachi for my firm. It's so easy to integrate within our environments. In terms of being able to quickly build ETL jobs, transform, and then automate them, it's really easy to integrate throughout for data analytics."
"The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs."
"The abstraction is quite good."
"I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source."
"The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product."
"We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule."
"It is quite useful and powerful."
"We like the flexibility of modeling."
"The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms"
"Offers great flexibility."
"When we have needed help from the IBM team, they were helpful. Our company is a premium partner so we get fast responses."
"As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables."
"The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities."
"It's a robust solution."
"We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back."
"The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time."
"If you use JDBC Lookup, for example, it generally takes a long time to process data."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
"Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using."
"The testing and quality could really improve. Every time that there is a major release, we are very nervous about what is going to get broken. We have had a lot of experience with that, as even the latest one was broken. Some basic things get broken. That doesn't look good for Hitachi at all. If there is one place I would advise them to spend some money and do some effort, it is with the quality. It is not that hard to start putting in some unit tests so basic things don't get broken when they do a new release. That just looks horrible, especially for an organization like Hitachi."
"Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in."
"I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking."
"I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You have to search all these different places using a mouse, clicking everywhere... each report is coded in a binary file... You cannot search with a text search tool..."
"The performance could be improved. If they could have analytics perform well on large volumes, that would be a big deal for our products."
"I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors."
"It could be better integrated with programming languages, like Python and R. Right now, if I want to run a Python code on one of my ETLs, it is a bit difficult to do. It would be great if we have some modules where we could code directly in a Python language. We don't really have a way to run Python code natively."
"If you develop it on MacBook, it'll be quite a hassle."
"It doesn't have any big data connections. It would be good to have them because most of the systems are moving towards big data. There should also be a user-friendly way to interact with the cloud. Its loading process is very slow. It takes a lot of time for around 5 or 6 million records, and we are not able to provide real-time data to the vendors due to this delay. Its performance needs to be improved. It is also like a legacy system. It is not updated much. In higher versions, they only do small changes. We would like to have new features and new technologies."
"The error messaging needs to be improved."
"Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere."
"Their web interface is good but the on-prem sites are outdated. The solution could also be improved if they could integrate the data pipeline scheduling part of their interface."
"In the future, I would like to see more integration with cloud technologies."
"The initial setup could be more straightforward."
"It would be useful to provide support for Python, AR, and Java."
"What needs improvement in IBM InfoSphere DataStage is its pricing. The pricing for the solution is higher than its competitors, so a lot of the clients my company has worked with prefer other tools over IBM InfoSphere DataStage because of the high price tag. Another area for improvement in the solution stems from a lot of new types of databases, for example, databases in the cloud and big data have become available, and IBM InfoSphere DataStage is working on various connectors for different data sources, but that still isn't up-to-date, meaning that some connectors are missing for modern data sources. The latest version of IBM InfoSphere DataStage also has a complex architecture, so my team faced frequent outages and that should be improved as well."
StreamSets offers an end-to-end data integration platform to build, run, monitor and manage smart data pipelines that deliver continuous data for DataOps, and power the modern data ecosystem and hybrid integration.
Only StreamSets provides a single design experience for all design patterns for 10x greater developer productivity; smart data pipelines that are resilient to change for 80% less breakages; and a single pane of glass for managing and monitoring all pipelines across hybrid and cloud architectures to eliminate blind spots and control gaps.
With StreamSets, you can deliver the continuous data that drives the connected enterprise.
Hitachi Lumada Data Integration is a top-raking data integration tool that aims to deliver accurate data from various sources to end users. This is a complete data integration platform that utilizes visual tools in the delivery of analytics-ready data. The product eliminates coding and complexity to ensure equal accessibility of its services to IT users as well as businesses that do not specialize in the field.
The solution offers powerful data integration, which is achieved through:
Users of Hitachi Lumada Data Integration can collaborate to build, deploy, and monitor dataflows in order to streamline data delivery. The visual tools of the product reduce the time of operation and lower complexity, allowing even beginners to operate the platform seamlessly. The onboarding process is initiated through broad connectivity to a wide variety of data sources and applications.
A drag-and-drop interface allows users to easily create data pipelines and ready-made templates to execute edge to cloud. The product provides users with the opportunity to blend data on premises or using cloud services, including Azure, Amazon Web Services (AWS), and Google Cloud Platform (GCP). The tool allows for a seamless switch between the native engine and Apache Spark, and operationalizes Python, Scala, and Weka machine-learning models.
The tool offers features for extensive business analytics through:
Hitachi Lumada Data Integration offers its clients modern data architectures for data analytics. Through interactive visualizations and easy integration, users are able to increase data integrity for their organizations. The product offers a web-based drag-and-drop dashboard for a flexible experience, collaboration with other applications, and advanced multi tenancy. There is special enterprise reporting which consists of operational self-serving reporting, security with content permissions, and additional high-level protection, achieved through locking, and expirations.
Hitachi Lumada Data Integration Features
The tool offers its clients various features which can be used to achieve efficient data integration and further analysis. These features include:
Hitachi Lumada Data Integration Benefits
The tool offers increased work productivity through efficient data integration. A number of the benefits include:
Reviews from Real Users
Philip R., a senior engineer at a comms service provider, says this product "Saves time and makes it easy for our mixed-skilled team to support the product".
Ryan F., a senior data engineer at Burgiss, appreciates Hitachi Lumada Data Integration because low-code makes development faster than with Python.
IBM InfoSphere DataStage is a high-quality data integration tool that aims to design, develop, and run jobs that move and transform data for organizations of different sizes. The product works by integrating data across multiple systems through a high-performance parallel framework. It supports extended metadata management, enterprise connectivity, and integration of all types of data.
The solution is the data integration component of IBM InfoSphere Information Server, providing a graphical framework for moving data from source systems to target systems. IBM InfoSphere DataStage can deliver data to data warehouses, data marts, operational data sources, and other enterprise applications. The tool works with various types of patterns - extract, transform and load (ETL), and extract, load, and transform (ELT). The scalability of the platform is achieved by using parallel processing and enterprise connectivity.
The solution has various versions, catering to different types of companies, which include the Server Edition, the Enterprise Edition, and the MVS Edition. Depending on which version a company has bought, different goals can be achieved. They include the following:
IBM InfoSphere DataStage can be deployed in various ways, including:
IBM InfoSphere DataStage Features
The tool has various features through which users can integrate and utilize their data effectively. The components of IBM InfoSphere DataStage include:
IBM InfoSphere DataStage Benefits
This solution offers many benefits for the companies that utilize it for data integration. Some of these benefits include:
Reviews from Real Users
A data/solution architect at a computer software company says the product is robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data.
Tirthankar Roy Chowdhury, team leader at Tata Consultancy Services, feels the tool is user-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features.
Hitachi Lumada Data Integration is ranked 6th in Data Integration Tools with 26 reviews while IBM InfoSphere DataStage is ranked 9th in Data Integration Tools with 9 reviews. Hitachi Lumada Data Integration is rated 7.8, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of Hitachi Lumada Data Integration writes "Saves time and makes it easy for our mixed-skilled team to support the product, but more guidance and better error messages are required in the UI". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "Robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data". Hitachi Lumada Data Integration is most compared with SSIS, Talend Open Studio, Informatica Enterprise Data Catalog, Oracle Data Integrator (ODI) and Spring Cloud Data Flow, whereas IBM InfoSphere DataStage is most compared with SSIS, Talend Open Studio, AWS Glue, Azure Data Factory and Informatica PowerCenter. See our Hitachi Lumada Data Integration vs. IBM InfoSphere DataStage report.
We monitor all Data Integration Tools reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.