What is our primary use case?
Our use cases include connecting a lot of legacy data systems to our logical components. For example, if somebody has a question that they post to us and say, "Tell me everywhere in our organization where we have a policy stored?" the primary use case is to logically define what a policy is, and then we use Collibra to tie that logical construct to a technical implementation. We may have six or eight, however many, different admin systems. We bring in the schemas of the way that those systems look, and then how a policy exists in this database and this table and this column, for example, in that legacy system.
The second use case that we implement is the ability to track the provenance or the lineage as to how something changes over time. For example, if we bring data in from a legacy system and we use some tool set (e.g. Azure Data Factory) to extract the data into a Hadoop data lake, and then perform some transformations on it, we want to be able to track it; "It came from the source system here, and this field got changed to this name, and we applied this transformation on this field and it eventually shows up on this report here."
We use it to track where a policy exists and also how it got there: it exists on this report and here's how it got on that report, here are all the steps that it took getting through to that particular report from the actual source system itself. Because quite often what we're finding is that our business users will get a report and they'll say, "I think your report's wrong. How did you get that value on that report?" That provenance or lineage is what helps answer those questions.
We have data stewards who are the resources that if somebody proposes a new logical asset based on what they think the customer means, these data stewards are the ones that would get together and look at what's being proposed and make sure it works across all of our business units for a generic implementation, or create business unit specific terms if required. They're the ones that say a particular system or term or logical construct is ready for consumption by end users.
Another group we have is the end users. We try to have people use Collibra by asking, "Don't tell me what system you want to get access to, tell me what you're looking for in business terms/constructs." In our example, it would be the question, "Tell me about all the policies in our system." They would go to Collibra and "shop" for that data and pick a policy and put it into the shopping cart basket that Collibra provides as part of their interface. Then they would submit that request for approval/access to the underlying data.
We also have data stewards who approve the use of new/updated business terminology and end users who are looking for their data to make business decisions. We also have some power users who are the resources who are setting the direction for the application of where we want to go with it, (e.g. new workflows or new functionality within Collibra).
For us, the Collibra application is an on-premise installation (although we use IaaS VMs to host it on cloud); it is not their SaaS implementation.
How has it helped my organization?
One of the biggest questions that we had was we didn't know what to with the our tons of legacy systems. The company I work for is a fairly old company, it's over 120 years old within the insurance industry. There are lots of systems that have been around for upwards of say 40 or 50 years, so we're trying to consolidate and bring those down to target, to go from say 15 systems down to three. But not knowing what's in those other systems makes it difficult to do that rationalization. It's enabled us to first understand what we have and then to figure out how we get down to the target state architecture with a reduced number of target systems.
What is most valuable?
Out of box ingestions of technical metadata as well as ease of use for setting up new business metadata for users to represent their business terms
What needs improvement?
Collibra is very good at talking to modern database systems such as a normal RDBMS (e.g.DB2, SQL server or Oracle). Where it isn't great is with older technologies that you'll typically find in finance or insurance industries (e.g. VSAM or ISAM, or those types of older technologies). It just doesn't connect with them very easily. They do provide an ability to use a separate product called MuleSoft, which they used to license (as a bundle) up until last year until Salesforce bought MuleSoft, and that division is happening in 2021. With this 'bolt-on', you could go and get that data, but you had to write that code and maintain it yourself. It wasn't an out-of-box (OOB) feature, which is what we really liked from the Collibra offering. Our only way to access these older technologies was to create a MuleSoft flow, maintain, and deploy it. This leaves us with technical debt which will need to continually be maintained. In fact, we built all our custom Mulesoft flows using Mule 3.x and will soon be pushed to upgrade to Mule 4.x. This will not be a simple upgrade and will likely result in additional cost to bring in consulting resources more familiar with the technology. Since we do have a lot of older legacy systems, things that aren't greenfield, if you will, it adds a lot more overhead than what we were originally led to believe when we originally purchased the product.
We're not that deep into the Collibra product yet because it's only been a couple of years. We do like their ability to automate the workflows, such that, for example, if somebody comes in to say, "I want to request access to this data," you can build your own workflows to automate the approval process. There are some that are out-of-box, I think they could go a little bit further with some of their out-of-box workflows instead of having to create a workflow manually, get somebody to code it, and implement it. I think they could offer a bit more in that respect.
The second item that I think they could do better at is to have other products, or have things where they have a set of taxonomy per industry that says, "Here's what a policy is. Here's what a customer is," that kind of thing. They don't implement that out-of-box in Collibra, you have to do that yourself, whereas other products bring that to the table. Informatica, I believe, has their own insurance industry or industry specific taxonomy that would come with the product.
It makes adding the new logical constructs to Collibra a more manual workup to take care of. The classification becomes more manual because you don't get that out-of-box to say, "Hey, I recognize that that's a policy, because I know that about that and the taxonomy." You have to manually make that connection.
For how long have I used the solution?
I have been using Collibra Governance for about two and a half years.
What do I think about the stability of the solution?
Collibra Governance's stability is quite good. It doesn't take a lot of maintenance to deal with it, it just runs. It doesn't cause a ton of issues and it doesn't require a lot of upgrades (we usually upgrade once/year). In the couple of years we've done, I think, two upgrades on it. The one thing that we're disappointed with is that 5.7.7 is their last on-premise implementation that you can do. You have to go to a SaaS offering by Collibra, after it's just been released end of November.
Being the industry that we're in, we're very risk averse, so our use of SaaS offerings isn't that large, and our company isn't prepared to put a lot into the cloud, especially when it comes to personally identifiable information (PII). We're very nervous about that. With that limitation, we would have preferred that Collibra would have extended the timeline of their on-premise offerings beyond this.
What do I think about the scalability of the solution?
I don't have a lot to say about scalability because we haven't had the system pushed that hard. I think we started out with an initial 25 users, and we might have a couple of hundred now. We haven't had any complaints from end users in terms of not returning information in a timely fashion or the system isn't working as good as I would expect. We haven't had enough experience to comment on that. Our current installation is approximately 175 users with about 15-25 concurrent usage. We went with the vendor recommended VM sizings although we did put all services for Collibra on one VM (except JobServer and Connect as recommended). For larger implementations, Collibra will recommend that you split out services (e.g. DGC, Search, Repository) onto separate VMs to allow performance tuning but our implementation hasn't come to that yet.
How are customer service and support?
In my experience technical support is pretty good. They're fairly responsive. If I enter a case, I'll usually hear back either later that day, so maybe a five or an eight hour turnaround, or definitely within two business days. I find if it's beyond a basic question, it takes a little bit to get it pushed to another level, to their second level support. Sometimes it takes a while for them to say, "I don't know the answer, now I'll ask second level to assist me with that." Getting past the first level, like most vendors, is a bit difficult because they want the call answered there, but it is not unreasonable in any respect
Which solution did I use previously and why did I switch?
We previously used the IBM Information Governance Catalog, IGC. We had used that as part of the whole suite (e.g. Information Analyzer, InfoSphere, etc.). We went out and did vendor assessments and had demos from the vendors come in to set a strategic direction. We determined what our strategic platform was going to be in terms of a data catalog. IGC just quite frankly wasn't anywhere in the realm of what Collibra could offer in comparison. It felt like comparing Windows 3.1 to a Windows 95 interface. Collibra is known as the 'Cadillac' offering from a user perspective. There are some things that it is not as technically good for, such as Alation is quite good at crowdsourcing or crowd approval approach. But in our opinion, Collibra offered the most features from one product overall. It's a bit on the pricier end, but when we looked at the Gartner Quadrants and Forrester Waves, it was always consistently either one or two up there with, say, Informatica or other tool sets like that.
How was the initial setup?
That's actually what my role is, as the technical lead. I'm the one who did the installation, and is responsible for patching and that kind of stuff. I'm not an end-user of it as much, I don't go into it every day to do workflows or create the data, but if there's a technical request or something, that's where I would get engaged.
The initial setup is fairly straightforward. I found the Collibra pre-sales and their support pretty helpful. They got back to you in a timely manner to be able to do the setup. It wasn't a difficult implementation by any stretch. It was about what I expected in terms of the timeline that they had provided for us and what we needed to do.
In terms of the actual installation process, it was maybe a couple of days start to finish once the hardware and everything was there. Then you continue to do your configuration as time goes on to connect to different systems and whatnot.
Most of that was put forth on advice from the vendors. We said here's the usage count that we plan to have, here's how many systems we're targeting originally. We looked to Collibra to give us the recommendation as to VM sizing and implementing. We didn't really create our own, we used theirs and customized it slightly for our environments, but it was mostly a vendor-provided plan of implementation.
What about the implementation team?
We used in-house resources to build/deploy the IaaS environment and complete the installation of Collibra. We have used 3rd party firms to develop custom Mulesoft flows for connecting legacy systems and custom workflows
What was our ROI?
We've had good ROI, because when we look at the amount of time invested, it's not necessarily dollars out the door; it's more about manual work avoidance. Instead of having somebody have to manually enter all of these different systems and characteristics, we can do integrations between our source systems and Collibra to get that automatically and refresh it. As people make changes to source systems as time goes on, we can automatically bring those into Collibra. It has allowed us to do one of the projects that we had on the books for this year, which was to understand what our critical systems were. Not only for disaster recovery, but where is our most important data about our customers? Where does that reside and how can we take that data and join it to understand more about our customers and their needs?
In our scenario, we have different business units with the same customer, but we can't make that realization that it's the same customer in different business units because of the way the systems grew separately over the years. Collibra is the one that's allowing us to tie that together. It opens up additional revenue streams with the ability to say, "Hey, I noticed you bought a product for this business unit from us. Did you know we also sell this product for this other business unit?" It allows us that cross-selling opportunity or upselling if you will (aka Revenue generation). That's a bit difficult to articulate or quantify in hard dollars, because there are so many steps going from a lead all the way to a sale. But we certainly believe that the information that Collibra has been able to provide us has helped or augmented our revenue generation streams. In a way it is a sales enablement tool.
What's my experience with pricing, setup cost, and licensing?
In terms of pricing, it's not bad. You pay more money for the author licenses, which is where you do most of your entry and whatnot. Whereas consumers are basically viewing information and using the tool to say, "Hey, I want to look at this data." I think what we would like to get to eventually might be an enterprise license, rather than having to say, "I'm going to pay for 50 authors or 100 authors." At some point in the future, I could see us wanting an enterprise license.
They may offer that now, but it wasn't at a price that was palatable for our company at this point. Plus, we needed a few years to get uptake in it to justify going to that high level. It's just more money licensing wise, but not unrealistic, in my opinion. The money is well spent for the product and the services we're getting.
Which other solutions did I evaluate?
We just found that IGC was way behind the times. IBM had not really put any money into their product, it didn't connect with any of the systems that we wanted to do. It simply just didn't fill our needs.
We did look at the Informatica product and we did look at Alation.
I think what we found with Alation is that it was good. The user interface was impeccable, but it was not what we would consider the whole package. It was very good at the catalog portion, but in terms of interconnectivity with different systems, it did not have workflow, which was a key characteristic that we were looking for. Alation was a fairly new company. It was only maybe three or four years old at the time when we looked. There was concern about the staying power for that particular vendor. Not that their product wasn't good, it just wasn't as full a product as we would have had with Collibra but built on something for workflow, which we weren't interested in. We were looking for one product to do that.
The Informatica offering was quite good as well, but in our investigations and interviews with other companies in our industry, Informatica is quite a complex product to get up and running and to maintain. It's not cheap either, but when we looked at what it would take to care and feed our maintenance on the Informatica side of the house, in comparison to what we could do with Collibra, we chose Collibra.
What other advice do I have?
Everything seems to be going the route of software as a service these days. It does take away somewhat your ability to customize like you want. Some products allow you to do that better with their SaaS offering than others. I would say that the data catalog space changes quite rapidly. When we did our investigation a couple of years ago, Alation hadn't been in business that long, they've continued to grow and maybe their offering has become better. Just because we chose something two or three years ago, doesn't mean that we shouldn't re-evaluate that in another couple of years to say, "Is this still the strategic product for us?"
There tends to be a lot of vertical integration going on. We once thought, "Well, let's just buy IBM because everything works with IBM." That doesn't seem to work any more. There seems to be a lot of best of the breed. But when you do that, there can be a lot of interoperability there that just doesn't work out. That people who like the IBM's of the world say, "We'll just buy our product because everything integrates." It truly doesn't in our experience.
You have to do your homework and definitely interview other customers to understand their experience for what is good and bad, because of course, sales isn't going to tell you that. But do your homework and make sure that you're talking to people who have not only installed the system, but have been able to use it for a few years, to see what's good about it, what's bad, and what they might have done differently. We talked to a number of different customers in the insurance field, in Canada, the U.S. and in Europe, and learned different things that we would have never considered on our criteria had we not talked to them.
On a scale of one to ten, I would peg it at a seven and a half, eight. I would put it higher, only except it doesn't connect as well to our legacy systems without additional programming and a separate tool, which they used to license as the whole product, but when MuleSoft got bought out by Salesforce, that business relationship was severed. Now we have to buy that MuleSoft product separately from Collibra. Now we have a data governance product that used to include MuleSoft (but does not now,) and now we have to deal with a second vendor to get that. It was nice when it was all one product. If they're going to say, "Use MuleSoft to get at your legacy systems," fine, sell me that product. But they won't do that anymore because Salesforce owns it.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
*Disclosure: My company does not have a business relationship with this vendor other than being a customer.