ATTENTION! We have to do a short maintenance with downtime on Tue 7 Oct 2025, 09:00 - 10:00 CEST . Please finish your work in time to prevent data loss.

2nd Forum Helmholtz Research Data Commons: Enhancing Research Data Workflows for and with AI

Europe/Berlin
Zoom (online)

Zoom

online

Description

The Helmholtz Metadata Collaboration (HMC) and the Helmholtz Open Science Office invite you to the second iteration of the Helmholtz Research Data Commons on September 30, 10:00 to 12:00 CEST, this time focussing on: Enhancing Research Data Workflows for and with AI. In this online event, colleagues from the AI team at the Jülich Supercomputing Center will provide practical insights into preparing research data for AI and showcase their AI tool BLABLADOR, a free and privacy-aware Helmholtz AI LLM service, and its possible applications in research data analysis. The event will include an open discussion round on the topic with the speakers, which everyone is invited to participate in.

For participation, please register here.

Research Data Commons is a joint recurring Helmholtz-forum for the exchange and discussion of research data-relevant topics at Helmholtz, initiated in 2024 by the Helmholtz Metadata Collaboration (HMC) and the Helmholtz Open Science Office. The events are open to employees from all Helmholtz centers to share their experiences and approaches around specific focus topics.

Organised by

Helmholtz Open Science Office;
Helmholtz Metadata Collaboration

Registration
    • 1
      Welcome and Introduction
      Speakers: Mathijs Vleugel (Helmholtz Open Science Office), Sören Lorenz (Helmholtz Metadata Collaboration)
    • Talks and Presentations
      Convener: Marc Lange (Helmholtz Open Science Office)
      • 2
        Is Your Data Ready for AI? A Practitioner‘s Perspective

        The recent successes of Artificial Intelligence and Machine Learning have been possible only as a consequence of published datasets. Benchmark datasets such as ImageNet (Deng, J. et al., 2009.) have been developed as tools to measure the progress in the field, and have become the quasi-standard. Recently, extremely large data collections, such as The Pile (Gao et al., 2020) and Fineweb2 (Penedo et al., 2025) are the fertile ground for the development of Open Source Large Language Models. Scientific data sets such as the Protein Data Bank are the basis for breakthroughs such as AlphaFold (Jumper et al, 2021). Highly visible, high quality dataset in the right context can contribute to significantly advance a scientific field. In my talk, I‘ll discuss the success factors for datasets and different strategies to make them visible for AI method experts. Breakthroughs in AI are steered by the challenges that the scientists are trying to solve. Help them find the most exciting ones.

        Speaker: Stefan Kesselheim (Forschungszentrum Jülich)
      • 3
        BLABLADOR – The experimental Helmholtz AI LLM server
        Speaker: Alexandre Strube (Forschungszentrum Jülich)
      • 4
        Use Case: LLM Applications in Ground-Based Gamma Astronomy

        We present a multi-agent application for next-generation Cherenkov Telescope Array Observatory, designed to automate the generation of Pydantic Python models directly from free-text descriptions or structured files. It also explores the use of the multi-agent framework AutoGen, as well as minimal function tools available in new OpenAI interfaces, incorporating a feedback loop to verify and refine generated code before user presentation, streamlining the workflow for astrophysical data management. The app is baked with Blablador's GPT-OSS, as the best performing model for the task out of selected models.

        Speakers: Elisa Jones (DESY), Dmitriy Kostunin (DESY)
      • 5
        Q & A
    • Discussion – Enhancing Research Data Workflows for and with AI
    • 6
      Closing