Share your experience using Infosolve Technologies Data Quality

The easiest route - we'll conduct a 15 minute phone interview and write up the review for you.

Use our online form to submit your review. It's quick and you can post anonymously.

Your review helps others learn about this solution
The PeerSpot community is built upon trust and sharing with peers.
It's good for your career
In today's digital world, your review shows you have valuable expertise.
You can influence the market
Vendors read their reviews and make improvements based on your feedback.
Examples of the 83,000+ reviews on PeerSpot:

PeerSpot user
Data Architect at World Vision
Real User
Top 5Leaderboard
SSIS MatchUp Component is Amazing
Pros and Cons
  • "The high value in this tool is its relatively low cost, ease of use, tight integration with SSIS, superior performance (compared to competitors), and attribute-level advanced survivor-ship logic."
  • "The tool needs to provide resizable forms/windows like all other SSIS windows. Vendor claims its an SSIS limitation however all SSIS components are resizable so that isn't true. This is just an annoyance but needless."

What is our primary use case?

We use this tool for B2B and B2C customer de-duplication/matching, generating a golden version of our customers and for householding. 

How has it helped my organization?

We use Melissa Data Matchup for SSIS to de-duplicate our customer data on a daily basis so that we were able to reduce marketing costs and increase the quality of communication with customers.

It replaced a weekly primitive custom de-duplication (record level) matching process.

Its survivor-ship logic handles very complex column-level rules efficiently providing us with a first-time for a single version of truth for our customer data. It's inherent intelligence into name and address parsing provides a very accurate exact match with no false positives and no unexpected false negatives. We are continually impressed by its sophistication and ease of use. The tool does not requires a middle tier or specialized staff like every other tool on the market.

What is most valuable?

The high value in this tool is its relatively low cost, ease of use, tight integration with SSIS, superior performance (compared to competitors), and attribute-level advanced survivor-ship logic. There's no separate server needed and no separate application to maintain.

This vendor offers a large variety of components from on-prem to cloud SaaS as well as hybrid of cloud and on-prem. This review is specific to the "MatchUp for SSIS" component.

For us, this tool had very high value due to the fact that we didn't have to become experts in some overly complicated DQ tool. And because it is fully integrated with our EDW ETL rather than having to originate and integrate an external application.

We are using it for daily 1) direct matching, 2) column-level survivor-ship and 3) mail house-holding. We started with B2C customers and later added B2B customers. The tool supports unique matching specific to organization names and individual names (as well as a variety of other specialized types of data values) and works well in both cases. For example it can pull out nicknames and match on those.

One of the business and operational benefits for us is feeding the end result to Adobe Campaign for marketing automation. But the primary output is simply creating and managing an analytical golden record for our customer data. This has provided a very effective, holistic, maintenance-free, and extremely cost effective solution for us.

The initial POC was up and running in just a few days with no training needed. The plug-in into our ETL tool was seamless and fully integrated into our existing processes. Most of our effort was due to the need to identify customer survivor-ship requirements and validation. Any needed adjustment changes could be done very quickly allowing us to focus on business requirements instead of implementing technology.

What needs improvement?

- Scalability is a limitation as it is single threaded.  You can bypass this limitation by partitioning your data (say by alphabetic ranges) into multiple dataflows but even within a single dataflow the tool starts to really bog down if you are doing survivorship on a lot of columns.  It's just very old technology written that's starting to show its age since it's been fundamentally the same for many years.  To stay relavent they will need to replace it with either ADF or SSIS-IR compliant version.  

- Licensing could be greatly simplified. As soon as a license expires (which is specific to each server) the product stops functioning without prior notice and requires a new license by contacting the vendor. And updating the license is overly complicated. 

- The tool needs to provide resizable forms/windows like all other SSIS windows. Vendor claims its an SSIS limitation but that isn't true since pretty much all SSIS components are resizable except theirs! This is just an annoyance but needless impact on productivity when developing new data flows.

- The tool needs to provide for incremental matching using the MatchUp for SSIS tool (they provide this for other solutions such as standalone tool and MatchUp web service). We had to code our own incremental logic to work around this.

- Tool needs ability to sort mapped columns in the GUI when using advanced survivorship (only allowed when not using column-level survivorship).

- It should provide an option for a procedural language (such as C# or VB) for survivor-ship expressions rather than relying on SSIS expression language.

- It should provide a more sophisticated ability to concatenate groups of data fields into common blocks of data for advanced survivor-ship prioritization (we do most of this in SQL prior to feeding the data to the tool).

- It should provide the ability to only do survivor-ship with no matching (matching is currently required when running data through the tool).

- Tool should provide a component similar to BDD to enable the ability to split into multiple thread matches based on data partitions for matching and survivor-ship rather than requiring custom coding a parallel capable solution.  We broke down customer data by first letter of last name into ranges of last names so we could run parallel data flows.

- Documentation needs to be provided that is specific to MatchUp for SSIS.  Most of their wiki pages were written for the web service API MatchUp Object rather than the SSIS component.

- They need to update their wiki site documentation as much of it is not kept current. Its also very very basic offering very little in terms of guidelines. For example, the tool is single-threaded so getting great performance requires running multiple parallel data flows or BDD in a data flow which you can figure out on your own but many SSIS practitioners aren't familiar with those techniques.

- The tool can hang or crash on rare occasions for unknown reason. Restarting the package resolves the problem. I suspect they have something to do with running on VM (vendor doesn't recommend running on VM) but have no evidence to support it.  When it crashes it creates dump file with just vague message saying the executable stopped running.

For how long have I used the solution?

We have been using this product for over 7 years.  

What do I think about the stability of the solution?

No as long as you don't try to match on null last names or lots of duplicate (exact match) records or try to run it in the default 64 bit mode of SSIS (issue here is only with new versions).

What do I think about the scalability of the solution?

We can run 9 million customer record exact matches in 10 minutes using 5 partitions/parallel dataflows. Survivorship takes another 50 minutes. I'm sure you could run faster with dedicated hardware and running more parallel dataflows. The tool starts to exponentially slow down once you pass about 2 million customers in a single dataflow so its best to keep it at or under that number although mileage will vary depending on the complexity of your matching.  Its unfortunate that the vendor hasn't built in parallelism which would both eliminate the need to do this yourself.  They should be able to auto-scale it based on # of CPU's your running.

Even with that limitation this tool is magnitudes faster than the last matching tool I used and it wasn't a simple plug-in to an ETL tool. I recently heard of a competing tool that takes longer to match just a few thousand customers than this tool takes to run millions of them.

Note:

We probably run higher volumes than many organizations. For B2B and daily matching you could probably process a delta in a matter of a few minutes with this tool.  

Note:  I suspect an essential ingredient when considering scalability is whether you're calling a web service for matching or just on-prem. Their SSIS component is only on-prem but they offer a web service as well which we have not tested.

Combining survivorship and matching in the same data flow slows performance. We got much better performance by running in two separate dataflows - the first for just matching and then another for just survivorship (re-using the previous grouping numbers in the first match) to make it perform to our requirements.

How are customer service and support?

Customer Service:

Fairly typical vendor support. They are immediately attentive to problems and provide email notifications of software versions. The main technical contact we work with has been there for the last decade which is very refreshing!

Technical Support:

They regularly release new versions of the product with bug fixes and enhancements although just the matchup tool itself has changed very little in the past 5 years. 

However unless you can interact directly with the development team problems may not get resolved in a timely manner. I have usually been left coming up with my own solution in the time I was waiting for their support to provide answers from their support team.

Which solution did I use previously and why did I switch?

I have used Datamentors and SAS Dataflux in the past with good success although I would easily take this product over those products for just matching/survivorship purposes. We had tested Oracle's cloud-based Fusion product which wasn't actually a functioning product at the time. The MelissaData tool is light-years ahead of Datamentors, far easier to use and the price can't be compared. The SAS tool was very expensive.  All other matching tools require separate middle tier application verses this product which is just a plug-in to SSIS.

How was the initial setup?

Initial setup on the first install was VERY easy. Propagating the matching rules to the next server was easy IF you know which file to copy which isn't well documented. The tool is extremely easy to use when you know just a few little things which aren't documented. Their development staff were very helpful in providing simple tips on how to set it up.

What about the implementation team?

This was in-house implementation. The vendor was very responsive in answering questions.

What was our ROI?

I have no numbers for ROI but it's avoided having to spend 6 figures for similar functionality in another tool.  Plus since it's fully integrated with SSIS there is no need for separate server - more money saved. 

What's my experience with pricing, setup cost, and licensing?

This vendor has no equal in pricing for equivalent functionality. First no one else offers this level of integration with SSIS. Second other vendors with equal functionality all cost many times the cost of this tool. Third it doesn't require a separate server or large learning curve of new software. Fourth, this is one of the "go to" vendors for matching purposes as some master data and data quality tools are actually calling MelissaData Matchup object in the backend then charging you a lot for their pretty GUI to do this for you.

Which other solutions did I evaluate?

I evaluated Microsoft's DQS which could not scale over 100,000 customer records. DQS actually supported calling MelissaData Matchup in the old Microsoft Marketplace (no longer available) to use it's more sophisticated matching but it was a moot point if DQS can't handle the volume.  

What other advice do I have?

This tool is a dream compared to my previous experience with batch matching/de-duplication tools. And the pricing is incredible given its functionality and simplicity. High value and very lost cost. If you're an SSIS shop (they support other ETL tools also however) and you need to de-duplicate, household and/or do column-level survivorship then this tool can't be beat.

I highly advise running parallel threads by splitting your dataflow into multiple paths.  This allow parallel matching and increaes throuput significantly.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
WesamHabboub - PeerSpot reviewer
Chief Consultant at Insight360
Consultant
Top 20Leaderboard
Stands out for its user-friendly interface, robust community support, competitive pricing and strategic approach to improving data accuracy
Pros and Cons
  • "The most valuable feature lies in the capability to assign data quality issues to different stakeholders, facilitating the tracking and resolution of defective work."
  • "In terms of the solution's technical support, the interactions were satisfactory, but there is room for improvement, especially in managing expectations."

What is our primary use case?

We recently deployed it for one of our clients, who use it to enhance the quality of their government-related customer data. The primary focus is on ensuring compliance with government policies, and it serves as a crucial component in achieving data quality improvements.

How has it helped my organization?

The primary advantage revolves around enhancing the quality of the customer's technology through the utilization of Talend Data Quality. By initiating the process with the tool, users can identify and address various data issues through profiling. This proactive approach results in an improvement in data quality, ultimately contributing to more informed and effective decision-making.

What is most valuable?

Its greatest asset lies in its user-friendly interface, specifically within the Talend Open Studio, known for its ease of use and familiarity among users. The robust community support proves invaluable when encountering challenges, providing a reliable resource for issue resolution. Moreover, the pricing structure stands out as highly competitive compared to other offerings in the market, making it a cost-effective choice for users. The most valuable feature lies in the capability to assign data quality issues to different stakeholders, facilitating the tracking and resolution of defective work. This functionality enables a streamlined process for identifying, assigning, and subsequently addressing data quality issues.

What needs improvement?

Talend suite might have a missing product, particularly in the commercial master aspect. This would contribute to completing the overall picture, though the focus isn't necessarily on economic considerations. It would be beneficial to have added a greater openness in the tool, allowing for the presentation of data quality results in alternative tools, which would provide increased flexibility in sharing and utilizing data quality outcomes.

For how long have I used the solution?

I have been working with it for two years.

What do I think about the stability of the solution?

It provides good stability capabilities.

What do I think about the scalability of the solution?

We haven't applied scalability to any existing customer implementations so far.

How are customer service and support?

In terms of the solution's technical support, the interactions were satisfactory, but there is room for improvement, especially in managing expectations. During recent interactions, there was a sense that the support provided fell short of expectations. The support team communicated that a paid service was available for installation and configuration, but other support needs were not adequately addressed. While there is an understanding of the limitations, better assistance could have been provided. On a scale of one to ten, I would rate the support experience at a six.

How would you rate customer service and support?

Neutral

How was the initial setup?

The initial setup proved to be challenging for our team. The challenges were more pronounced when deviating from the default setup, especially when opting for a database other than Postgres. The manual installation process appeared less streamlined, leaving room for improvement in its execution. I remember the team investing at least three to four days in the installation process.

What about the implementation team?

For a relatively straightforward scenario, where a single customer addresses Data Quality from one source, the deployment process follows a strategic approach. Initially, the strategy involves focusing on one source system, with the deployment executed by customer engineers and the Talend tool. The deployment doesn't require an extensive team initially; it relies on adequate resources for the deployment phase. However, even in this streamlined process, collaboration with the customer's team is crucial. The deployment necessitates involving other team members from the customer side to ensure the tool is effectively utilized. The process involves deploying, training, and initiating the setup with the initial system. Subsequently, the customer is empowered to continue and expand the deployment journey autonomously. The entire process can be concluded within a month, contingent upon the active participation of the customer team. However, the timeline isn't solely contingent on technical implementation; a significant factor is the adoption on the customer side. Realistically, substantial results become more apparent between three to six months, a duration influenced by factors such as the size of the customer and the complexity of their processes.

What other advice do I have?

The key to success lies in the adoption of the solution within the customer's processes and services. My recommendation is to initiate the implementation by focusing on critical data. By starting with essential data sets, you can swiftly demonstrate tangible results to the business. This approach is strategic because, often, the technical aspects of the technology are not easily comprehensible to the business stakeholders. Begin with a small yet high-value segment to enhance data quality, and then gradually extend the implementation to cover the entire organization. This phased approach ensures a smoother transition and a more significant impact on overall business processes. Overall, I would rate it eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Flag as inappropriate