Blog

CI/CD Pipelines in data platforms

CI/CD pipelines are now an integral part of modern software development, whether in web, backend, AI/ML development or in the context of data platforms. However, the specific design of the pipeline and its benefits in the development environment can vary greatly. The following article shows a concept for the provision of a decentralized CI/CD pipeline in a data platform.

The main objective of an internal data platform is to simplify data processing within a company. This is achieved through the management of dependencies between data products and the automated provision of components and tools for data processing. Platform processes must be scalable so that the maintenance effort ideally remains constant as the number of users increases. This is made possible by self-service concepts and can be found out by answering the following questions:

  • How can new users be onboarded onto the pipeline?
  • How can users customize the pipeline to their needs?
  • How can users operate the pipeline?
  • How can governance guidelines be implemented?
Onboarding of new users into the CI/CD pipeline 

The use of a CI/CD pipeline is part of a solution pattern (“What is to be achieved and how?”) and must therefore be considered in the context of the entire provisioning process. The integration of new users takes place as part of the platform onboarding processes that provision and initialize the entire tool stack of a project. In addition to the CI/CD pipeline, this also includes the provision of project resources, identity and access management configurations and other elements.

Customizing the pipeline to user requirements

At this point, it is important to differentiate in advance how the CI/CD pipeline is made available to users:

  1. Provisioning of resources in an environment managed by the project team itself.
  2. Provisioning of resources in a central platform environment in which various resources of several projects live.

To offer developers maximum autonomy and flexibility, I would generally recommend option 1. Provisioning in an environment managed by the development team itself gives them the ability to customize resources as needed. This preserves their flexibility and allows them to use the full potential of the respective runtime to adapt their tooling to their specific requirements. At the same time, the platform can take some of the work off their hands through predefined provisioning and initialization.

However, we must consider the reality of the respective system landscape. Inadequate authorization concepts, missing functions in the platform, governance requirements or technical legacy issues can mean that decentralized management of environments is not feasible.

In both scenarios, however, it is essential that the platform is precisely tailored to the respective solution pattern (“What do I want to do and how?”) and provides the required resources correctly. Otherwise, an internal platform can cause more harm than good: At best, users must set up their resources again; at worst, the platform makes the developers’ work so difficult that the platform is not accepted.

Operating in a central environment

The following requirements must be met: In a centrally managed environment, project resources of one team must be decoupled from another team to allow flexible configuration by users. The authorization concept should support adjustments by users by allowing them to log into the respective component and configure their resources as required.

If these requirements are not met, the implementation of an interface for modifying the project-specific resources could be an alternative. This can be implemented using a configuration file, a browser-based UI or other options.

Based on this interface, processes can now be triggered to modify the project-specific resources accordingly.

This can be done in two ways: Either the configuration is determined during a CI/CD run at runtime and processed dynamically, or it is already defined at build time. The latter means that every modification by the interface triggers a new build, which reinitializes and reconfigures the CI/CD pipeline of a project. By providing one CI/CD pipeline per project, interference between project-specific resources can also be avoided.

Implementing governance guidelines

Governance policies can be implemented effectively if the platform engineering team controls the central environment, modification options and the implementation of the pipeline. In decentrally managed environments, additional techniques such as interceptors or log scanning can be used to ensure that all executed commands comply with governance requirements.

My Toolstack

Contact me for a free consultation

Together we can discuss your requirements and define the project parameters. Contact me via: