Love your data? Make it reproducible! A workshop on reproducibility in data science




Love your data? Make it reproducible!

A workshop on reproducibility in data science

Reproducibility in Science is increasingly receiving attention by the international academic community and is steadily gaining importance. Ensuring transparency of your research project does not only add value to the quality of your project, but also adds to the development of a scientific community that can validate and strengthen research outcomes on the long run. In this event organized by the Helmholtz Open Science Office and HIDA, we will focus on digital reproducibility using and citing software as well best practices in data science.

We will start the event with two key notes on reproducibility, first in the context of open science (Bernadette Fritzsch) and second in relation to data science and AI methods (Tingying Peng). Afterwards, you will have the choice to attend and actively take part in one of these three workshops:

1. Data and reproducibility management with DataLad by Adina Wagner (FZ Jülich)

2. Practical steps towards reproducible science by Heidi Seibold

3. Software Citation- Current Practice and Recent Developments by Tobias Schlauch (HIFIS)


The workshop's documentation can be found below: see the respective talks for the published slides and accompanying materials. Please note that the two keynote talks have been made available as recordings as well. 


This workshop is part of the International Love Data Week 2023

The Helmholtz Open Science Office supports the Helmholtz Association as a service provider in shaping the cultural change towards open science. It represents Helmholtz in various open science initiatives, is involved in third-party funded projects, and in this way communicates the Helmholtz positions on open science on a national and international level.

HIDA - the Helmholtz Information & Data Science Academy - is Germany’s largest postgraduate training network in the field of information and data science. We prepare the next generation of scientists for a data-heavy future of research.


Bernadette Fritzsch: After studying physics with a doctorate in physics of solid states, Bernadette switched to climate research. At the datacenter of the Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research Bremerhaven, Bernadette Fritzsch provides user support in the area of ​​high performance computing for earth system modeling and research data. She has been involved with research software since her studies, is a founding member of de-RSE ,and has been on the board of the association since it was founded. She is also a founding member of the German Reproducibility Network (GRN) and is active in its steering group.

Tingying Peng is a Helmholtz AI young investigator group leader of "AI for microscopy image analysis". As indicated by the group title, the mission of the group is to create new AI methods to help life scientists and pathologists to analyze microscopic images more quantitatively and efficiently, allowing them to extract more knowledge. Her group has worked on various microscopy imaging types, including histopathological images for computational pathology, classic brightfield and fluorescence images, and more advanced ones, such as Cryo-electron tomography (Cryo-ET), 3D light-sheet microscopy and, extended depth-of-field (EDOF) microscope with "Electrically Tunable Lenses". Before joining Helmholtz, Tingying obtained her PhD degree in University of Oxford and was also a Humboldt postdoc in Technical University of Munich.

Sophia Wagner is a PhD student in Computational Pathology, Computer Vision, Deep Learning at helmholtz AI and Technical University Munich.

Tobias Schlauch is working for the institute for software technology at the German Aerospace Center (DLR) since 2005. He contributed to different research projects as software engineer with regard to workflow and data management and supported them in context of software quality assurance. Since 2009, he serves as the representative of the DLR software engineering initiative.

Adina Wagner is a research associate at the Forschungszentrum Jülich and doctoral researcher at the Heinrich heine University Düsseldorf. She is a software developer for the DataLad project, an open source data management tool built upon Git and git-annex, and a proponent of open science, open source, and reproducible research.

Heidi Seibold is an expert for open and reproducible research, with a focus on data science and health research. She is the host of two podcasts: "Open Science Stories" and ">reboot academia". You can follow Heidi on Twitter under @HeidiBaya.

    • 9:30 AM 9:40 AM
      • 9:30 AM
        Welcome 10m
        Speakers: HIDA , Helmholtz Open Science Office
    • 9:40 AM 10:20 AM
      Keynote: Keynotes
      • 9:40 AM
        Keynote 1: Open Science and Reproducibility 20m

        Open Science comprises a range of scientific work practices that make research transparent and comprehensible. It is thus closely related to reproducibility as a cornerstone for the trustworthiness of research. The publication of and access to research data and software is indispensable in this context in order to be able to understand the results. The FAIR principles developed for research data have now also been transferred and adapted to research software. Many existing local and topic-specific initiatives want to promote cultural change in science towards open science and thus increase the quality of scientific work and the robustness of results. In the lecture, the role of the German Reproducibility Network GRN as a platform for networking such groups will be presented, so that different scientific communities can learn from each other and thus go step by step on the way to more reproducibility.

        Speaker: Bernadette Fritzsch (AWI)
      • 10:00 AM
        Keynote 2: Reproducibility in the context of AI methods in Medicine 20m

        Artificial intelligence faces reproducibility crisis as unpublished code and sensitivity to training conditions make many claims hard to verify. This is also the case for AI in medicine. For example, for the field of computational pathology, despite an ever-growing number of publications, only few methods are reused by other researchers and even fewer have entered a clinical routine workflow. A team of Helmholtz Munich researchers now analyzed how to improve reusability and reproducibility of these deep learning algorithms and present our findings in the workshop.

        Speakers: Sophia Wagner (Helmholtz AI) , Tingying Peng (Helmholtz Munich)
    • 10:20 AM 10:30 AM
      Break 10m
    • 10:30 AM 12:30 PM
      Workshop session
      • 10:30 AM
        Data and reproducibility management with DataLad 2h

        The path to reproducible science is paved with many open source software tools - this workshop introduces one of them. Using hands-on examples centered around the tool DataLad ( ), we will discover core concepts for reproducibility, such as version control, digital provenance, containerization, or data publication. The workshop focuses on technical and conceptual aspects alike, and aims to equip everyone with skills they could transfer to the real-world data collected during their research.

        Speaker: Adina Wagner (FZJ)
      • 10:30 AM
        Practical steps towards reproducible research 2h

        In this workshop we will discuss and implement steps to make your research reproducible. Reproducible research means that with the same data and the same analysis you get the same results every time you run the analysis. That sounds simple, but is not and as it turns out most researchers fail at it. Join this workshop to improve your research through simple and impactful practices!

        Speaker: Heidi Seibold
      • 10:30 AM
        Software Citation - Current Practice and Recent Developments 2h

        Software citation is an important part of the provenance of research results and enables their reproducibility. In addition, software citation presents an option to improve the credit for software related work in the academic system. In the workshop, we present the current practice of how to cite software and how to make software citable. In addition, we provide an outlook about recent developments. Besides presenting information, we plan different interactive formats to give you hands-on experiences with regard to software citation.

        Speaker: Tobias Schlauch (DLR / HIFIS)
    • 12:30 PM 1:00 PM
      • 12:30 PM
        Wrap-up & Farewell 30m
        Speakers: Adina Wagner (FZJ) , Bernadette Fritzsch (AWI) , HIDA , Heidi Seibold , Helmholtz Open Science Office , Tingying Peng (Helmholtz Munich) , Tobias Schlauch (DLR / HIFIS)