IT Central Station is now PeerSpot: Here's why

Talend Data Quality OverviewUNIXBusinessApplication

Talend Data Quality is #1 ranked solution in top Data Scrubbing Software and #5 ranked solution in top Data Quality tools. PeerSpot users give Talend Data Quality an average rating of 9.0 out of 10. Talend Data Quality is most commonly compared to Informatica Data Quality: Talend Data Quality vs Informatica Data Quality. Talend Data Quality is popular among the large enterprise segment, accounting for 68% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a computer software company, accounting for 28% of all views.
Buyer's Guide

Download the Data Quality Buyer's Guide including reviews and more. Updated: July 2022

What is Talend Data Quality?
The data quality tools in Talend Open Studio for Data Quality enable you to quickly take the first big step towards better data quality for your organization: getting a clear picture of your current data quality. Without having to write any code, you can perform data quality analysis tasks ranging from simple statistical profiling, to analysis of text fields and numeric fields, to validation against standard patterns (email address syntax, credit card number formats) or custom patterns of your own creation.
Talend Data Quality Customers
Aliaxis, Electrocomponents, M¾NCHENER VEREIN, The Sunset Group
Talend Data Quality Video

Talend Data Quality Pricing Advice

What users are saying about Talend Data Quality pricing:
  • "It is cheaper than Informatica. Talend Data Quality costs somewhere between $10,000 to $12,000 per year for a seat license. It would cost around $20,000 per year for a concurrent license. It is the same for the whole big data solution, which comes with Talend DI, Talend DQ, and TDM."
  • "It's a subscription-based platform, we renew it every year."
  • Talend Data Quality Reviews

    Filter by:
    Filter Reviews
    Industry
    Loading...
    Filter Unavailable
    Company Size
    Loading...
    Filter Unavailable
    Job Level
    Loading...
    Filter Unavailable
    Rating
    Loading...
    Filter Unavailable
    Considered
    Loading...
    Filter Unavailable
    Order by:
    Loading...
    • Date
    • Highest Rating
    • Lowest Rating
    • Review Length
    Search:
    Showingreviews based on the current filters. Reset all filters
    IT Manager at a insurance company with 10,001+ employees
    Real User
    Top 5Leaderboard
    Saves a lot of time, good ROI, seamless integration with different databases, and stable
    Pros and Cons
    • "It is saving a lot of time. Today, we can mask around a hundred million records in 10 minutes. Masking is one of the key pieces that is used heavily by the business and IT folks. Normally in the software development life cycle, before you project anything into the production environment, you have to test it in the test environment to make sure that when the data goes into production, it works, but these are all production files. For example, we acquired a new company or a new state for which we're going to do the entire back office, which is related to claims processing, payments, and member enrollment every year. If you get the production data and process it again, it becomes a compliance issue. Therefore, for any migrations that are happening, we have developed a new capability called pattern masking. This feature looks at those files, masks that information, and processes it through the system. With this, there is no PHI and PII element, and there is data integrity across different systems. It has seamless integration with different databases. It has components using which you can easily integrate with different databases on the cloud or on-premise. It is a drag and drop kind of tool. Instead of writing a lot of Java code or SQL queries, you can just drag and drop things. It is all very pictorial. It easily tells you where the job is failing. So, you can just go quickly and figure out why it is happening and then fix it."
    • "They don't have any AI capabilities. Talend DQ is specifically for data quality, which only has data profiling. With Talend DQ, I cannot generate any reports today, so I need an ETL tool. It provides general Excel files, or I have to create some views. If instead of buying a new tool, Talend provides a reporting capability or solution, it would be great. It will reduce the development effort for creating these kinds of reports. We also manage the infrastructure for Talend. From the licensing perspective, for cloud, they only have seat licenses where one person is tied to one license, but for on-premise, they have concurrent licenses. It would be really awesome if they can provide concurrent licenses for the cloud so that if one person is not there, somebody else can use that license. Currently, it is not possible unless a person deactivates his or her license and moves the same seat license to someone else. We are one of the biggest customers in the central zone of the US for Talend, and this is the feedback that we have provided them again and again, but they come back and say that they aren't able to provide concurrent licenses on the cloud. In version 7.3, there is a feature for tokenization and de-tokenization of data. This is the feature that we are looking for. It is useful if somebody wants to see what we have masked and how do we demask it. This feature is not there in version 7.1. There are also a few other capabilities on the cloud, but we don't yet have a big footprint in the cloud."

    What is our primary use case?

    Talend has different modules. Talend has Talend Data integration (DI), Talend Data Quality (DQ), Talend MDM, and Talend Data Mapper (TDM). We have Talend DI, Talend DQ, and TDM. Our use cases span across these modules. We don't use Talend MDM because we have a different solution for MDM. Our EDF team is using an Informatica solution for that.

    We have a platform that deals with MongoDB, Oracle, and SQL Server databases. We also have Teradata and Kafka. The first use case was to ensure that when the data traverses from one application to another, there is no data loss. This use case was more around data reconciliation, and it was also loosely tied to the data quality.

    The second use case was related to data consistency. We wanted to make sure that the data is consistent across various applications. For example, we are a healthcare company. If I'm just validating the claim system, I need to see how do I inject the data into those systems without any issues. 

    The third use case was related to whether the data is matching the configurations. For example, in production, I want to see:

    • If there is any data issue or duplicate data?
    • Is the data coming from different states getting fed into the system and matching the configurations that have been set in our different engines, such as enrollment, billing, and all those things?
    • Is it able to process this data with our configuration?
    • Is it giving the right output?

    The fourth use case was to see if I can virtually create data. For example, I want to test with some data that is not available in the current environment, or I'm trying to create some EDA files, which are 834 and 837 transaction files. These are the enrollment and claims processing files that come from different providers. If I want to test these files, do I have the right information within my systems, and who can give me that information.

    The fifth use case was related to masking the information so that in your environment, people don't have access to certain data. For example, across the industry, people pull the data from production and then just push it into the lower environment and test, but because this is healthcare data, we have a lot of PHI and PII information. If you have your PHI and PII information in production and I am pulling that data, I have everything that is in production in the test environment. So, I know your address, and I know your residents. I can hack into your systems, and I can do anything. This is the main issue for us with HIPAA compliance. How do we mask that information so that in your environment, people don't have access to it?

    These are different use cases on which we started our journey. Now, it is going more into the cloud, and we are using Talend to interact with various cloud environments in AWS. We are also interacting with Redshift and Snowflake by using Talend. So, it is expanding. We are using version 7.1, and we are migrating to version 7.3 very soon.

    How has it helped my organization?

    It is saving a lot of time. A person doesn't need to sit and create a file to test. Instead, there are automation processes that are like self-service, and with a few clicks, people are able to generate the data and process it to complete the testing. This gives more confidence in the quality of the deployment that happens in production. The outages have also reduced.

    Overall, from 2017 to 2020, we have almost saved around 140,000 to 160,000 hours, which is only with respect to the data. I don't know how much we have saved because of masking. If masking is not there and compliance-related things come up, it could be $2 billion to $3 billion of expense that a company has to bear. Because masking is there, it gives more confidence. Not having the PHI and PII footprint in the lower environment has helped our organization.

    What is most valuable?

    It is saving a lot of time. Today, we can mask around a hundred million records in 10 minutes. Masking is one of the key pieces that is used heavily by the business and IT folks. Normally in the software development life cycle, before you project anything into the production environment, you have to test it in the test environment to make sure that when the data goes into production, it works, but these are all production files. For example, we acquired a new company or a new state for which we're going to do the entire back office, which is related to claims processing, payments, and member enrollment every year. If you get the production data and process it again, it becomes a compliance issue. Therefore, for any migrations that are happening, we have developed a new capability called pattern masking. This feature looks at those files, masks that information, and processes it through the system. With this, there is no PHI and PII element, and there is data integrity across different systems. 

    It has seamless integration with different databases. It has components using which you can easily integrate with different databases on the cloud or on-premise. 

    It is a drag and drop kind of tool. Instead of writing a lot of Java code or SQL queries, you can just drag and drop things. It is all very pictorial. It easily tells you where the job is failing. So, you can just go quickly and figure out why it is happening and then fix it.

    What needs improvement?

    They don't have any AI capabilities. Talend DQ is specifically for data quality, which only has data profiling. With Talend DQ, I cannot generate any reports today, so I need an ETL tool. It provides general Excel files, or I have to create some views. If instead of buying a new tool, Talend provides a reporting capability or solution, it would be great. It will reduce the development effort for creating these kinds of reports.

    We also manage the infrastructure for Talend. From the licensing perspective, for cloud, they only have seat licenses where one person is tied to one license, but for on-premise, they have concurrent licenses. It would be really awesome if they can provide concurrent licenses for the cloud so that if one person is not there, somebody else can use that license. Currently, it is not possible unless a person deactivates his or her license and moves the same seat license to someone else. We are one of the biggest customers in the central zone of the US for Talend, and this is the feedback that we have provided them again and again, but they come back and say that they aren't able to provide concurrent licenses on the cloud.

    In version 7.3, there is a feature for tokenization and de-tokenization of data. This is the feature that we are looking for. It is useful if somebody wants to see what we have masked and how do we demask it. This feature is not there in version 7.1. There are also a few other capabilities on the cloud, but we don't yet have a big footprint in the cloud.

    Buyer's Guide
    Data Quality
    July 2022
    Find out what your peers are saying about Talend, Experian, Informatica and others in Data Quality. Updated: July 2022.
    622,063 professionals have used our research since 2012.

    For how long have I used the solution?

    We have been using this solution since 2017. I was the person who brought this solution into this organization.

    What do I think about the stability of the solution?

    It is stable. I haven't seen any kind of outages for Talend DQ.

    What do I think about the scalability of the solution?

    Scalability depends on how many job servers you have. For example, if you have one job server and you are trying to process 2 million, 3 million, or 1 billion records, it might take more time. If you have more job servers so that you can run these jobs in parallel, your jobs will run faster. Networking also comes into play. For example, I am in California, and if I am trying to access something in North Carolina and process data, it could be slow. If my server is located in California, it would be pretty fast.

    In terms of the number of users, DQ is specific to the data governance team that has five to seven people. For the Talend solution as a whole, we have around 150 people. It is a big solution, but its maintenance is not that big effort because you are not writing any code. If you know Talend and a little bit of Java, managing it should not be a that big effort.

    How are customer service and support?

    Sometimes, we have challenges because they don't understand the business, but we have to explain it to them. I can't expect them to understand everything about healthcare and then give me a solution. They provide services to a lot of different industries. 

    They have been pretty responsive. We are in the high-tier, and they have defined SLAs in terms of the turnaround time for any kind of issues. They have definitely been very helpful. In the past, when we were not in that particular tier, we had some challenges where it took a little bit of time in getting a response. Sometimes, they also sent some weird responses, and we had to go back and forth, but for a showstopper, their response has been pretty good.

    What was our ROI?

    We were able to save 140,000 to 160,000 hours based on the solutions and capabilities that we have built from 2017 to 2020. If I multiply it by $80, it would be somewhere around a billion dollars that we have already saved. If I take five licenses for three years, the savings would be $350,000. I don't know the ROI with respect to Informatica. Slowly, our EDF team is switching over from Informatica to Talend, and they say that it is pretty huge.

    What's my experience with pricing, setup cost, and licensing?

    It is cheaper than Informatica. Talend Data Quality costs somewhere between $10,000 to $12,000 per year for a seat license. It would cost around $20,000 per year for a concurrent license. It is the same for the whole big data solution, which comes with Talend DI, Talend DQ, and TDM.

    What other advice do I have?

    I would rate Talend Data Quality a nine out of ten.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Practice Manager (Digital Solutions) at a computer software company with 201-500 employees
    MSP
    Top 10Leaderboard
    Data management platform that is cost effective and easily implements data from multiple sources
    Pros and Cons
    • "The features that I find to be the most valuable are the extensibility, the integration, and the ease of integration with multiple platforms."
    • "I would say that some of the support elements need improvement."

    What is our primary use case?

    Our use cases vary, but mainly we are using it for implementing a master data management platform. We get data from multiple sources and create a golden ticket record that can be used for ingesting the data from that single source to any of the platforms. 

    What is most valuable?

    The features that I find to be the most valuable are the extensibility, the integration, and the ease of integration with multiple platforms. The integration is one of the great features,  which is mainly what we use it for.

    What needs improvement?

    I would say that some of the support elements need improvement. It is built on open-source technology and they provide platinum support, but they need improvement. We have a large customer base and they need more customized support from them.

    I would like to see more advancements with certain big data technology that they have that hasn't been added to the platform. It's something that they could add in the future.

    For how long have I used the solution?

    We have been using this solution for two years.

    What do I think about the stability of the solution?

    It's a stable platform. We process large volumes of data and have not experienced any challenges.

    What do I think about the scalability of the solution?

    We have not had any challenges, so I would say that it's a scalable solution.

    One of our clients has ten developers. The licensing is based on developers and ten is a large number to have. It's a sizable project.

    It's being used on a daily basis by a global client. They don't have data stewards that are available across different countries on the globe for this organization. They access it with the master record that has been created.

    They currently have ten developers but plan to increase up to 25 users in the future.

    How was the initial setup?

    We have team members who have worked on the deployment. From their experience and the feedback that I have received, it is very easy to deploy and to manage this solution.

    It takes approximately four weeks to deploy Talend itself, but there are some use cases that are very long and take some time. 

    The average project for us is approximately three to four months because we build the integrations, cleanse the data, and many other things that are required once the platform is set up.

    The initial setup takes four weeks. It is straight forward and there is not a lot of complexity.

    What's my experience with pricing, setup cost, and licensing?

    It's very cost-effective in terms of licensing and implementation.

    It's a subscription-based platform, we renew it every year.

    There are different packages that are Data Integration, Data Management, and Data Fabric. The Data Fabric is the highest package and they sell it based on the number of end-users/developers.

    There are not really additional fees, but we have to be mindful of the add ons, which are not part of the standard package. They have a standard package with some inclusions, but there are normally add ons that would have to be purchased. Customers need to be mindful of that and share the requirements before getting the licensing for their project.

    Which other solutions did I evaluate?

    We also evaluated Oracle. Oracle was more of a closed platform that was good for technologies, but we wanted something that is more extensible and supports the technologies for integration.

    Talend is partly open-source and cost-wise, it is quite optimized compared to Oracle.

    What other advice do I have?

    It's a great technology.

    Be careful of the add ons and there are different packages. Most customers are fine with starting off with Data Management, but there are add ons. I would suggest sharing the requirements or the scope with Talend so that they have an understanding of the work features that are required to grow with the right brand of data fabric for data management. 

    It's one of the leading platforms. It has customers who are looking for a good and cost-effective data integration platform, that can connect with multiple technologies. 

    It has data integration and data management capabilities, which is why many customers use Talend, but there are some elements with the support that they need to look at.

    I would rate this solution an eight out of ten.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
    PeerSpot user
    Buyer's Guide
    Data Quality
    July 2022
    Find out what your peers are saying about Talend, Experian, Informatica and others in Data Quality. Updated: July 2022.
    622,063 professionals have used our research since 2012.
    Jugal Dhrangadharia - PeerSpot reviewer
    Associate Team Lead at a tech services company with 51-200 employees
    Real User
    Leaderboard
    We needed to stop manually finding and cleaning data through Excel spreadsheets.

    How has it helped my organization?

    Data Quality easily identifiable instead of manual finding and cleaning the data through Excel (earlier used to follow) before ETL

    What is most valuable?

    Currently the best open source data quality tool available as compared to other open DQ tools ('DataCleaner', 'Open Source Data Quality & Profiling') for of a variety of reasons:

    1. Vast connectors to different DB, Web, CRM, etc
    2. Custom code is allowed
    3. Wide range of advanced algorithms
    4. Recommended for advanced users
    5. Detailed analysis, etc
    6. Large community of users

    The most valuable features for us are: custom code, connectors, algorithms.

    What do I think about the stability of the solution?

    As it is a open source tool, some minor bugs are there.

    How was the initial setup?

    Fairly straightforward. Lots of user guides and tutorials are available to get started.

    What's my experience with pricing, setup cost, and licensing?

    The best part is that it is open source.

    What other advice do I have?

    Great product, surely give it a try.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    Karthik Babu - PeerSpot reviewer
    Senior Consultant at a tech services company with 201-500 employees
    Consultant
    Leaderboard
    Customizable and straightforward implementation

    What is most valuable?

    The solution is customizable.

    For how long have I used the solution?

    I have been using Talend Data Quality for approximately four and a half years.

    What do I think about the scalability of the solution?

    The performance is one area that Talend Data Quality could improve in because large volumes take a lot of time.

    How are customer service and support?

    I have not needed the technical support.

    How was the initial setup?

    The implementation is not difficult, it has been straightforward for the implementations we have done.

    What about the implementation team?

    We do the implementation of this solution.

    What other advice do I have?

    I rate Talend Data Quality a nine out of ten.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    Buyer's Guide
    Download our free Data Quality Report and find out what your peers are saying about Talend, Experian, Informatica, and more!
    Updated: July 2022
    Buyer's Guide
    Download our free Data Quality Report and find out what your peers are saying about Talend, Experian, Informatica, and more!