Mar 5 – 7, 2024
Julius-Maximilians-Universität Würzburg
Europe/Berlin timezone

Refactoring and isolation data pipelines through the use of software containerization and continuous integration

Not scheduled
20m
Mathematisch-Naturwissenschaftliches Hörsaalgebäude (Julius-Maximilians-Universität Würzburg)

Mathematisch-Naturwissenschaftliches Hörsaalgebäude

Julius-Maximilians-Universität Würzburg

Am Hubland 97074 Würzburg
Poster Computational Workflows Poster Session

Speaker

Mr Benjamin Bruns (Forschungszentrum Jülich GmbH)

Description

At the IAS-8 institute of Forschungszentrum Jülich, the accurate and complete collection of measurement and environmental data is essential for subsequent analyses and modeling in many projects. Although the Bayeos server (https://github.com/BayCEER/bayeos-server) used at FZJ provides an open and standardized data platform for such data, the import and transformation of data from different sources is often difficult in terms of provision, traceability and subsequent adjustments. To address this problem, a flexible import and transformation pipeline for time series data was developed based on Python and a PostgreSQL-based integration database. There is a clean separation of import, transformation and aggregation processes, which also allows for easy customization. Each individual step of the defined pipeline runs as a container in a Docker environment. There is a template for a basic pipeline, which can be easily customized to define additional pre- and post-processing steps. This template has been successfully adapted for different existing data pipelines. Once this has been done, the containers are built automatically using the CI/CD pipeline of the DevOps platform Gitlab. In addition, Gitlab's own container registry ensures easy deployment and updating of the pipeline elements.

Primary author

Mr Benjamin Bruns (Forschungszentrum Jülich GmbH)

Presentation materials

There are no materials yet.