What is our primary use case?
The Data Intelligence suite helps us manage all our data governance activities around setting up metadata, data lineage, business clusters, and business metadata definitions. These are all managed in this tool. There is definitely extended use of this data set, which includes using the metadata that we have built. Using that metadata, we also integrate with other ETL tools to pull the metadata and use it in our data transformations. It is very tightly coupled with our data processing in general. We also use erwin Data Modeler, which helps us build our metadata, business definitions, and the physical data model of the data structures that we have to manage. These two tools work hand-in-hand to manage our data governance metadata capabilities and support many business processes.
I manage the data architecture plus manage the whole data governance team who designed the data pipelines. We designed the overall data infrastructure plus the actual governance processes. The stewards, who work with the data in the business, set up the metadata and manage this tool everyday end-to-end.
How has it helped my organization?
The benefit of the solution was the adoption of a lot of business partners using and leveraging our data through our governance processes. We have matrices of how many users have been capturing and using it. We have data consultants and other data governance teams who are set up to review these processes and ensure that nobody is really bypassing them. We use this tool in the middle of our work processes for utilization of data on the tail-end, letting the business do self-service, and build our own IT things.
When we manage our data processes, we know that there are some upward sources or downstream systems. We know that they could be impacted based on some changes coming in from the source or some related to the lineage and impact analysis that this tool brings to the table. We have been able to identify system changes which could impact all downstream systems. That is a big plus because IT and production support teams are now able to use this tool to identify the impact of any issues with the data or any data quality gaps. They can notify all the recipients upfront with the product business communications of any impacts.
For any company mature enough to have implemented any of these data governance rules or principles, these are the building blocks of the actual process. The criticality is such because we want the business to self-service. We can build data lakes or data warehouses using our data pipelines, but if nobody can actually use the data to be able to see what information they have available without going through IT sources, that defeats the whole purpose of doing this additional work. It is a data platform that allows any business process to come in and be self-service, building their own processes without a lot of IT dependencies.
There is a data science function where a lot of critical operational reporting can be done. Users leverage this tool to be able to discover what information is available, and it's very heavily used.
If we start capturing additional data about some metadata, then we can define our own user-defined attributes, which we can then start capturing. It does provide all the information that we want to manage. For our own processes, we have some special tags that we have been able to configure quickly through this tool to start capturing that information.
We have our own homegrown solutions built around the data that we are capturing in the tool. We build our own pipelines and have our own homegrown ETL tools built using Spark and cloud-based ecosystems. We capture all the metadata in this tool and all the transformation business rules are captured there too. We have API-level interfaces built into the tool to pull the data at the runtime. We then use that information to build our pipelines.
This tool allows us to bring in any data stewards in the business area to use this tool and set up the metadata, so we don't have to spend a lot of time in IT understanding all the data transformation rules. The business can set up the business metadata, and once it is set up, IT can then use the metadata directly, which feeds into our ETL tool.
Impact analysis is a huge benefit because it gives us access to our pipeline and data mapping. It captures the source systems from which the data came. For each source system, there is good lineage so we can identify where it came from. Then, it is loaded into our clean zone and data warehouse, where I have reports, data extracts, API calls, and the web application layer. This provides access to all the interfaces and how information has been consumed. Impact analysis, at an IT and field levels, lets me determine:
- What kind of business rules are applied.
- How data has been transformed from each stage.
- How the data is consumed and moved to different data marts or reporting layers.
Our visibility is now huge, creating a good IT and business process. With confidence, they can assess where the information is, who is using it, and what applications are impacted if that information is not available, inaccurate, or if there are any issues at the source. That impact analysis part is a very strong use case of this tool.
What is most valuable?
The most critical features are the metadata management and data mapping, which includes the reference data management and code set management. Its capabilities allow us to capture metadata plus use it to define how the data lineage should be built, i.e., the data mapping aspects of it. The data mapping component is a little unique to this tool, as it allows the entire data lineage and impact analysis to be easily done. It has very good visuals, which it displays in a build to show the data lineage for all the metadata that we are capturing.
Our physical data mapping is using this tool. The component of capturing the metadata, integrating the code set managers and reference data management aspects of it with the data pipeline are unique to this tool. They are definitely the key differentiators that we were looking for when picking this tool.
erwin DI provides visibility into our organization’s data for our IT, data governance, and business users. There is a business-facing view of the data. There is an IT version of the tool that allows us to set up the metadata managed by our IT users or data stewards, who are users of the data, to set up the metadata. Then, the same tool has a very good business portal that takes the same information in a read-only way and presents it back in a very business-user friendly way. We call it a business portal. This suite of applications provides us end-to-end data governance from both the IT's and business users' perspective.
It is a central place for everybody to start any ETL data pipeline builds. This tool is being heavily used, plus it's heavily integrated with all the ETL data pipeline design and build processes. Nobody can bypass these processes and do something without going through this tool.
The business portal allows us to search the metadata and do data discovery. Business users come in and present data catalog-type information. This means all the metadata that we capture, such as AI masking, dictionaries, and the data dictionary, is set up as well. That aspect is very heavily used.
There are a lot of Data Connectors that gather the data from all different source systems, like metadata from many data stores. We configure those Data Collectors, then install them. The Data Connector that helps us load all the metadata from the erwin Data Modeler tool is XML-based.
The solution delivers up-to-date and detailed data lineage. It provides you all the business rules that data fields are going through by using visualization. It provides very good visualization, allowing us to quickly assess the impact in an understandable way.
All the metadata and business glossaries are captured right there in the tool. All of these data points are discoverable, so we can search through them. Once you know the business attribute you are looking for, then you are able to find where in the data warehouse this information lives. It provides you technical lineage right from the business glossary. It provides a data discovery feature, so you are able to do a complete discovery on your own.
What needs improvement?
The data quality has so many facets, but we are definitely not using the core data quality features of this tool. The data quality has definitely improved because the core data stewards, data engineers, data stewards, and business sponsors know what data they are looking for and how the data should move. They are setting up those rules. We still need another layer of data quality assessments on the source to see if it is sending us the wrong data or if there are some issues with the source data. For those things, we need a rule-based data quality assessment or scoring where we can assess tools or other technology stacks. We need to be able to leverage where the business comes in, defining some business rules and have the ability to execute those rules, then score the data quality of all those attributes. Data quality is definitely not what we are leveraging from this tool, as of today.
For how long have I used the solution?
I have been using it for four or five years.
What do I think about the stability of the solution?
We had a couple of issues here and there, but nothing drastic. There has been a lot of adoption of the tool increasing data usage. There have been a few issues with this, but not blackout-type issues, and we were able to recover.
There were some stability issues in the very beginning. Things are getting better with its community piece.
What do I think about the scalability of the solution?
Scalability has room for improvement. It tends to slow down when we have large volumes of data, and it takes more time. They could scale better, as we have seen some degradation in performance when we work with large data sets.
How are customer service and support?
We have some open tickets with them from time to time. They have definitely promptly responded and provided solutions. There have been no issues.
Support has changed hands many times, though we always land on a good support model. I would rate the technical support as seven out of 10.
They cannot just custom build solutions for us. These are things that they will deliver and add to releases.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We were previously using Collibra and Talend data management. We switched this tool to help us build our data mapping, not just field-level mapping. There are also aspects of code set management, where we are translating different codes that we are standardizing to enterprise codes. With the reference data management aspects of it, we can build our own data sets within the tool and that data set is also integrated with our data pipeline.
We were definitely not sticking with the Talend tool because it increased our delivery time for data. When we were looking for other platforms, we needed a tool that captured data mapping in a way that a systematic program could actually read and understand it, then generate the dynamic code for an ETL processor pipeline.
How was the initial setup?
It was through AWS. The package was very easy to install.
What was our ROI?
If I use a traditional ETL tool and build it through an IT port, it would take five days to build very simple data mapping to get it to the deployment phase. Using this solution, the IT cost will be cut down to less than a day. Since the business requirements are now captured directly in the tool, I don't need IT support to execute it. The only part being executed and deployed from the metadata is my ETL code, which is the information that the business will capture. So, we can build data pipelines at a very rapid rate with a lot of accuracy.
During maintenance times, when things are changing and updating, businesses will not have access to their ETL tool, code, and the rules executed in the code. However, using this tool with its data governance and data mapping, the data captured is what actually it will be. The rules are first defined, then they are fed into the ETL process. This is done weekly because we dynamically generate the ETL from our business users' mapping. That definitely is a big advantage. Our data will never be off the rules that the business has set up.
If people cannot do discovery on their own, then you will be adding a lot of resource power, i.e., manpower, to support the business usage of the data. A lot of money is saved because we can run a very lean shop and don't have to onboard a lot of resources. This saves a lot on manpower costs as well.
What's my experience with pricing, setup cost, and licensing?
The licensing cost was very affordable at the time of purchase. It has since been taken over by erwin, then Quest. The tool has gotten a bit more costly, but they are adding more features very quickly.
Which other solutions did I evaluate?
We did a couple of demos with data catalog-type tools, but they didn't have the complete package that we were looking for.
What other advice do I have?
Our only systematic process for refreshing metadata is from the erwin Data Modeler tool. Whenever those updates are done, we then have a systematic way to update the metadata in our reference tool.
I would rate the product as eight out of 10. It is a good tool with a lot of good features. We have a whole laundry list of things that we are still looking for, which we have shared with them, e.g., improving stability and the product's overall health. The cost is going up, but it provides us all the information that we need. The basic building blocks of our governance are tightly coupled with this tool.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
*Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.