Master Data Management (MDM) and Continuous (CI/CD) Technology

Make your data beautiful again.

Master Data Management (MDM) and Continuous (CI/CD) Technology

Over the past few years I have become an accidental expert in the implementation of Continuous Integration and Continuous Delivery for various Master Data Management implementations, specifically around one of the more ubiquitous technologies in the space from a company called Informatica. One of the glaringly obvious issues that many of the MDM vendors have is that they talk about being agile and call-out clearly that they fully support such an approach, but very few of them actually provide the tools out of the box to support such an implementation methodology. This becomes fairly obvious if you take a look at the documentation sets that come with the software and used by their consultants – it is skewed towards a more “traditional” waterfall approach to implementation.

One thing I have learnt about implementing MDM using agile, it that it really requires that front, you adopt a faster and better approach to deployment and releases. Many projects are happy to build in agile and then deploy once at the end of the development phase of the project, but I don’t think this is a true and valid agile approach. We should continuously test our hypothesis with our clients at every logical stage of the software development process – if not to show how clever we are, but also and more appropriately to make sure our customers have a accurate and up to date view on the direction we have taken.

There are some very cool tools in the CI and CD space that can really help, and many of then are virtually free to use, if not free itself, tools such as Jenkins, Bamboo and also more recently Pipelines and Deployments in the BitBucket world. All of them rely on one key thing – a code repository, and its ability to hold something that can actually be built and then deployed. This may not seem so obvious in MDM as we often have a huge “blob” of configuration and we simply don’t have the luxury where we deal with smaller units of work or classes, like those Java and C++ type folks do.

So we have to start to think about how we could effectively promote “code” into higher environments on a regular basis (as frequently as daily in one of my implementations) – and how this could relate to our slightly more packaged world of software configuration. Then we need to think about how “releases” work – and how configuration changes by multiple developers can be promoted into a code repository.

Releases in a CI/CD world allow us to have parallel streams of development and when the Development team have finished on the current sprint, we “release” artifacts to higher environments such as QA (Test) and then further to SIT (Systems Integration Testing), UAT (User acceptance Testing) and finally Production. Our release will reach QA, it will be tested over a period and if it passes, will finally get pushed on wards in the life-cycle to the next logical step. In the meantime the development team is now busy beavering away on the next set of features for the next release.

This sounds simple – but unfortunately where the granularity of our metadata is not defined down to the feature level – it becomes harder to attach code artifacts to a agile task or issue and thus causes us problems if we want a partial release of the code for a sprint. Lets say only 9 out of a possible 10 of our tasks will be released into UAT, how do we effectively take out the offending task?

Well I will be frank with you there are only two choices in the MDM world, you either check changes into the code repository element by element and hope that they will be autonomous enough that in the higher environment that their dependencies will be resolved and be carried through, or you get real and realize that the world is not perfect and it may be simplest to check-in the entire metadata set in every time a change has been made. I have tried both options, getting developers to isolate elements of metadata and check them in, tagged with feature ids, and I have taken the approach that all the elements in the metadata are in fact interacting with each other and ask the development team to check in the whole metadata set.

I found the latter works best – here’s how – if a release fails – it fails, we need to stop seeing failure in a development team as a bad thing, the way I see it is a failure is in fact a step towards success – failure is simply a learning step. So if a release fails because of a single element, then perhaps we should consider smaller iterations, or better still a Kanban style of agile, where instead of time-boxed iterations we take features and tasks on one by one and potentially deliver more frequently to higher environments. At the end of a sprint – we check the metadata into the code repository and it gets delivered to QA, QA tests and we accept or reject the release, when a reject happens – we simply raise a bug and it gets dealt with next time round.

So how do you structure Branches and Releases? It becomes simple, every sprint you create a code repository branch, at the end of the sprint it contributes to a corresponding release – which gets pushed on wards. Either the release is good or bad – it goes further or we blame it and it stops there, subsequent sprints fix the issues and we move on.

Metadata check-in using tools like Informatica MDM can be done, as the key API endpoints are exposed to extract and apply metadata changes, we can therefore extract the entire metadata and check it in – it is a complete set of rebuild information for the Development ORS at that point.
Deployment processes can be built for higher environments or shadow environments for dev testing, but even better it gives us the ability to roll our development ORS forwards and backwards. We can therefore theoretically roll back to a broken release, fix the change, check it in and then roll forwards to where we were before the fix started (and obviously merge the fix in here too).

Of course when we apply a metadata change to the target environment – we isolate the changes in the code repository that do not exist in the target environment and then only apply the minor changes – this means that we can be assured that any data in our target environment is not lost. Theoretically.

The unfortunate part about using tools like Jenkins and Bamboo is that the metadata manipulation has to be done in scripting and programming languages like java, however it is an investment well spent and can seriously improve the productivity of development teams with technologies such as Informatica MDM.

Contact us for more details on how to implement MDM in a fully CI/CD agile approach.


Robert Haynes is a veteran MDM solutions consultant and product director, he has been involved in the MDM world for over 15 years and helps to implement solutions for some of the worlds largest companies. He heads EntityStream as their CEO, EntityStream is a full function MDM solution that can be deployed in a single desktop or across multiple machines, but more importantly it requires no complex setup process nor lengthy configuration of models and feeds.

Leave a Reply

Your email address will not be published. Required fields are marked *