LEAPS-INNOV Workflow Co-Working Sprint Kickoff

Europe/Berlin
zoom

zoom

Description

What is this about?

In LEAPS-Innov WP7, we are working on novel compression schemes for data recorded at synchrotron light sources. In the process, we discovered a lot of genuine ways to reconstruct and record data. Unfortunately, many of these workflows were never shared or archived in a way that has reproducibility in mind.

With this co-working sprint, we would like to change that!  The goal of this co-working sprint is to pair scientists/engineers with people experienced in using a workflow engine that fosters reproducibility. They then work together (remotely) over the course of 6 weeks to facilitate an automated workflow (e.g. that is fit for HPC execution). The final goal is to publish this workflow on platforms like WorkflowHub so that the community can reuse these workflows. 

This event marks the kick-off of our co-working sprint.

How to apply as a team

We hope to attract beamline scientists, coders and (data) engineers that have never heard about workflow engines. If do however have experience with workflow engines, you can also apply. We will give priority to teams working on data from synchrotron light sources.

We guess that your team has a workflow in a coding language of choice available - for example as a sequence of python/shell scripts (potentially for use on a HPC cluster) or as a single jupyter notebook. For participating, you need to be able to share your data (or at least share example data) and code in a FAIR fashion. 

If you can tick all yes to the above, fill out the call for abstracts until Jan 31, 2023, and block February 10, 2023. That is all you have to do. We will get in touch with you shortly after your submission.

Sprint Timeline

  • Feb 10, 2023: Kick-off workshop with talks; short tutorials if required (Teams will be paired with their mentors on that day)
  • Feb 10- Mar 20, 2023: teams work with their mentors autonomously, there will be one weekly meeting to synchronize progress among teams
  • Mar 30, 2023: Presentation of Sprint Results at DESY Hamburg

Our Mentors

We are happy to have the following workflow mentors at our disposal:

  • Chris Hakkaart / Seqera Labs, experience with nextflow
  • Maxime Garcia / Seqera Labs, experience with nextflow
  • Felicita Gernhardt / HZDR, experience with snakemake
  • David Pape / HZDR, experience with snakemake
  • Peter Steinbach / HZDR, experience with snakemake
  • <your name here?>

Should you be interested to support our event, please get in touch with the organizers. The more mentors we have, the more teams we can host. 

We are also open with respect to tooling and will not be limited to one particular workflow engine. The workshop is organized to advocate open tools that foster reproducible open science. The engines we will offer mentoring in should empower every scientist to reuse and reproduce workflows from other institutes or centers irrespective of hardware platform and operating system.

About us

The organisation of this workshop is supported by

  • Peter Steinbach / HZDR
    • 10:00 10:10
      Welcome 10m

      A short introduction to the sprint

      Speaker: Peter Steinbach (HZDR)
    • 10:10 10:40
      Transparent, reproducible, and adaptable data analysis with Snakemake 30m

      The Snakemake workflow management system is a tool to create transparent, reproducible and adaptable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment. With over 600,000 downloads and over 1600 citations (on average >7 per week), Snakemake is one of the most widely used systems for reproducible data analysis.

      Speaker: Johannes Köster (Universitätsklinikum Essen)
    • 10:40 11:10
      An Introduction to Efficient and Scalable Pipeline Management with Nextflow 30m

      Nextflow is an open-source workflow orchestration tool for
      data-intensive pipelines. It has rapidly become an industry standard,
      enabling scalable and reproducible scientific workflows. Nextflow has a
      vibrant community with thousands of bioinformaticians as part of the
      nf-core project, which provides ready-to-use pipelines, ready-to-plug-in
      modules, and sub-workflows. Nextflow simplifies the implementation and
      deployment of complex workflows across almost all compute infrastructures -
      from HPC job schedulers to all of the main cloud providers. It has built-in
      support for software packaging tools such as Docker, Podman, Singularity,
      conda, automatically managing workflow toolchains and facilitating scalable
      and reproducible scientific workflows.

      In this talk I will introduce Nextflow and nf-core, explaining how the
      workflow manager works, the community tools available to streamline
      development, and how to get started building your own pipelines.

      Write your pipeline once, and run it anywhere

      Speaker: Marcel Ribeiro-Dantas (Seqera Labs)
    • 11:10 11:15
      Short Break 5m
    • 11:15 11:45
      Introduction to CWL and Workflowhub 30m
      Speaker: Michael R. Crusoe
    • 11:45 11:52
      Team Spotlight: P11 / DESY HH 7m
      Speaker: Helena Tabermann
    • 11:52 11:59
      Team Spotlight: FAXTOR / ALBA 7m
      Speaker: Alessandra Patera
    • 11:59 12:06
      Team Spotlight: KARA / KIT 7m
      Speaker: Yaroslav Zharov
    • 12:06 12:13
      Team Spotlight: Michal & Mikhail 7m
      Speaker: Michal Smid
    • 12:13 12:18
      Team Pairing 5m
      Speaker: Peter Steinbach (HZDR)