Enabling reproducibility in data science - learn why it matters and how you can do it

Name: Enabling reproducibility in data science - learn why it matters and how you can do it
Start: 2022-06-09T09:30:00+02:00
End: 2022-06-09T13:00:00+02:00
Location: online

Thursday 9 Jun 2022, 09:30 → 13:00 Europe/Berlin

online

Description

How do I make sure that independent researchers will be able to reproduce my research results? Why is that even important and what tools or methods are there to achieve this goal? Keeping reproducibility in mind from the very start of a project will help you in the long run and greatly add to the quality of your research. In this event organized by the Helmholtz Open Science Office and HIDA, we will focus on digital reproducibility and best practices for data science.

We will start off the event with two short lectures on reproducibility, first in the context of open science and second in relation to data science and machine learning. Afterwards, you will have the choice to attend and actively take part in one of these three workshops:

Practical steps towards reproducible research
Foundations of research software publication
Scientific metadata: Fundamentals of structured and standardized research data annotation

How to participate?

Participation is free of charge. Please register for the event by clicking on "Apply for Participation". During the registration process, you will be asked to choose one of the three workshops on offer. They will take place in parallel so you can only attend one. Please contact us if you would like to see more training events like this one.

Organizers

The Helmholtz Open Science Office supports the Helmholtz Association as a service provider in shaping the cultural change towards open science. It represents Helmholtz in various open science initiatives, is involved in third-party funded projects, and in this way communicates the Helmholtz positions on open science on a national and international level.

HIDA - the Helmholtz Information & Data Science Academy - is Germany’s largest postgraduate training network in the field of information and data science. We prepare the next generation of scientists for a data-heavy future of research.

Speakers

Silke Christine Gerlich is a Postdoctoral Associate in the Helmholtz Metadata Collaboration (HMC) Hub Information, hosted at Forschungszentrum Jülich (FZJ) Institute for Materials Data Science and Informatics (IAS-9). She holds a PhD in Molecular Plant Physiology and contributes to the German National Research Data Infrastructure (NFDI) consortium DataPLANT with a special interest in ontology development.

Tobias Schlauch is working for the institute for software technology at the German Aerospace Center (DLR) since 2005. He contributed to different research projects as software engineer with regard to workflow and data management and supported them in context of software quality assurance. Since 2009, he serves as the representative of the DLR software engineering initiative.

Heidi Seibold is an expert for open and reproducible research, with a focus on data science and health research. She is the host of two podcasts: "Open Science Stories" and ">reboot academia". You can follow Heidi on Twitter under @HeidiBaya.

Peter Steinbach is a trained particle physicist and currently leads the Helmholtz AI consulting team at Helmholtz-Zentrum Dresden-Rossendorf. He is passionate about applied statistics, machine learning and HPC if used appropriately.

Annika Strupp is a Data Steward in the Helmholtz Metadata Collaboration (HMC) Hub Information, hosted at Forschungszentrum Jülich (FZJ) Institute for Materials Data Science and Informatics (IAS-9). She worked as a Web Analytics Consultant in Marketing Technology before joining HMC. She is an extra-occupational masters's student in Digital Data Management (DDM) at Humboldt-Universität zu Berlin and Potsdam University of Applied Sciences (FHP).

Contact

open-science@helmholtz.de

hida-courses@helmholtz.de

- 09:30 → 09:40
  
  Welcome 10m
  
  Speakers: HIDA, Helmholtz Open Science Office
- 09:40 → 10:00
  
  Impulse lecture: Open science & reproducibility 20m
  
  Research is reproducible when it is possible to (independently) recreate the same results from the same data and same code/analysis as used by the original researcher or team of researchers. Reproducibility enhances collaboration and transparency in science and supports reusability of scientific products. This closely links with the open science endeavour towards the cultural change in science and science communication. Open science aims for a more effective and open exchange of information within science and the promotion of methods and the transfer of scientific results to society, the economy, and politics.
  
  Speaker: Helmholtz Open Science Office
  
  20220609_OpenScience-Reproducibility_Ferguson_Schrader.pdf
- 10:00 → 10:20
  
  Impulse lecture: Reproducibility in data science and machine learning 20m
  
  Machine Learning is becoming ubiquitous in many scientific domains. However, practitioners struggle to apply every new addition to the Machine Learning market on their data with comparable effects than published. In this talk, I'd like to present recent observations on reproducibility of Machine Learning results and how the community strives to tackle related challenges.
  
  Speaker: Peter Steinbach (HZDR)
  
  slides
- 10:20 → 10:30
  
  Break 10m
- 10:30 → 12:30
  Workshop session
  - 10:30
    
    Foundations of research software publication 2h
    
    We will provide you with actionable advice about how to prepare your research code before publishing it or submitting it alongside a research publication.
    
    This talk will cover the following topics:
    • Code repository structuring
    • Minimum coding practices
    • Documentation
    • Open source licensing
    • Minimum software release practices
    • Software citation
    
    We will discuss these topics at the example of a data analysis script and will focus on minimum practices for every topic.
    
    Speaker: Tobias Schlauch (DLR)
    
    2022_06_09_Foundations-of-Research-Software_Schlauch.pdf
    
    online pad
  - 10:30
    
    Practical steps towards reproducible research 2h
    
    In this workshop we will look at four key steps to get started with reproducible research:
    
    (1) Get organized
    (2) Use Open Source Software
    (3) Use Version Control
    (4) Make your work available online.
    
    Although reproducible research practices can first seem intimidating, we will find a way to get started that works for you.
    
    Speaker: Heidi Seibold
    
    2022_06_practical-steps-Reproducibility_Seibold.pdf
    
    online pad
  - 10:30
    
    Scientific metadata: Fundamentals of structured and standardized research data annotation 2h
    
    This course is limited to 25 participants
    
    Did you ever feel lost in incomprehensible research data documentation? This session introduces the basics of machine-readable research data annotation with domain-specific metadata schemas and standards. Learn, why accurate and harmonized description of research data is key for scientific exchange and how to find a suitable metadata framework in your research domain.
    
    Speakers: Annika Strupp, Silke Gerlich (HMC)
    
    2022_06_09_scientific-metadata-hmc_Gerlich-Strupp.pdf
    
    online pad
- 12:30 → 13:00
  
  Wrap-up 30m
  
  Speakers: Annika Strupp, Heidi Seibold, Silke Gerlich (HMC), Tobias Schlauch (DLR)

Choose timezone

Enabling reproducibility in data science - learn why it matters and how you can do it

online

How to participate?

Organizers

Speakers