deRSE25 and SE25 Timetables

Europe/Berlin
Audimax (Building 30.95)

Audimax

Building 30.95

Straße am Forum 1, 76131 Karlsruhe, Germany
Anne Koziolek (KIT), René Caspart (Karlsruhe Institute of Technology (KIT))
Description

Facts

  • What:
    5th conference for Research Software Engineering in Germany
    and
    Fachtagung Software Engineering (SE)
  • Begin: February 24, 2025
  • End: February 28, 2025
  • Organized and hosted by







  • Location: Karlsruher Institut für Technologie, Campus South, Karlsruhe

For Details on the two conferences see the respective websites for deRSE25 and SE25

Important Dates

  • deRSE25 Conference: 25. - 27. Feb 2025
  • SE25 Workshops: 24. - 25. Feb 2025 (programme)
  • SE25 Conference: 26. - 28. Feb 2025
    • 10:00 12:00
      RSQkit Contentathon: Collaborative Development and Integration of Research Software Quality Resources 2h SR A+B

      SR A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe

      The Research Software Quality toolkit (RSQkit, https://everse.software/RSQKit/) is a knowledge hub developed within the EVERSE project (https://everse.software/) that aims to become a permanent, community-driven resource for research software quality expertise. This interactive workshop provides hands-on experience with collaborative content development, where participants will have a chance to contribute their expert knowledge and will be given recognition for their contribution.

      Through guided exercises, participants will experience the complete contribution cycle - from content creation through peer review to proper attribution of their work. Contributions will be formally recognized and credited, ensuring participants receive acknowledgment for their expertise and input. Attendees will work in small groups to develop and refine content covering key aspects of research software quality, including:

      • Technical best practices
      • FAIR principles
      • Sustainability guidelines
      • Software management approaches
      • Metadata standards

      The workshop focuses on enhancing research software quality through collaborative knowledge sharing, providing participants with practical experience in developing, reviewing, and integrating community-driven content.

      Key Details:

      • Duration: 2 hours
      • Location: Karlsruhe Institute of Technology (KIT)
      • Start Time: 10:00 AM

      Expected Outcomes:

      • Practical contributions to the RSQkit
      • Experience in collaborative documentation development
      • Formal recognition of individual contributions
      Speaker: Giacomo Peru (University of Edinburgh)
    • 10:30 11:30
      HPC Carpentry Community Meetup 1h Seminarroom 006 (Building 30.96)

      Seminarroom 006

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      This meet-up is for anyone teaching HPC skills, and anyone who is interested in community-led training.
      HPC Carpentry is an open source community of RSEs, facility operators, developers, and others, who want to empower researchers through better, more inclusive training in HPC skills.
      The project is currently in incubation to join The Carpentries as a new lesson program.
      The HPC Carpentry project's mission is to provide interactive, hands-on instructional material focused around HPC in an inclusive and accessible model, where feedback from learners and instructors is used to continuously improve the material and the learner experience.
      Open to newcomers, existing HPC Carpentry community members, and anyone else interested in the project, this event is an opportunity for people who want to increase HPC competencies.
      The goals of the session will include
      - Providing a welcoming space for existing instructors to connect and share their experience and perspectives,
      - Connecting with potential instructors and workshop hosts and providing them with more information about the project and how our training is delivered, and
      - Better informing project coordinators about the training needs, and opportunities where we can contribute, at institutions across Germany, to help guide the project going forward.

      No prior knowledge of or affiliation to HPC Carpentry or The Carpentries is needed to be able to participate in this session, but some experience of or insight into the teaching of high-performance computing, especially to novices, would be beneficial.

      Speakers: Marc-Andre Hermanns (RWTH Aachen University), Toby Hodges (The Carpentries)
    • 10:30 12:00
      Webby FDOs with RO-Crates and Signposting 1h 30m Seminarroom 104 (Building 30.96)

      Seminarroom 104

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      While the FAIR principles provide some guidelines for research artifacts to be findable, accessible, interoperable and reusable, the FAIR Digital Objects (FDOs) add layers so implementations are more machine-actionable, e.g., guidelines for identifiers, typing and operations. Using web-based technologies makes it easier for researchers to implement FAIR and FDO guidelines as it reuses technologies already familiar to them.

      In this 1.5-hour tutorial, we will introduce “webby FDOs”, a practical approach to FDOs using Research Object Crate (RO-Crate) and FAIR Signposting. RO-Crate is a lightweight method to package research outputs along with their metadata while FAIR Signposting provides a simple yet powerful approach to navigate FAIR aspects of the scholarly objects on the Web. Our tutorial will give a brief introduction to the FDO principles, and show how they have been implemented using HTTP, HTML and JSON. After the FDO introduction, we will briefly introduce JSON-LD and will show how to use it to expose software metadata with GitHub pages. We will then present RO-Crate and Signposting and will proceed to enrich the previously created GitHub pages with them. We will finish the session with a practical example of webby FDOs in production, using a Biodiversity use case from the context of Common European Data Spaces (Green Deal Data Space).

      This session will follow an interactive walk-through with opportunities to discuss use cases and challenges. It aims to give the participants an overview of the technologies so they can go deeper into the hands-on exercises used during the session. Hands-on will include enough information for participants to follow them and apply them to their own (basic) use case. Brief knowledge of Web technology (HTTP, HTML, JSON) is an advantage, but not a requirement.

      Speakers: Leyla Jael Castro (ZB MED Information Centre for Life Sciences), Jonas Grieb (Senckenberg Gesellschaft für Naturforschung), Rohitha Ravinder (ZB MED - Information Centre for Life Sciences, Cologne, Germany), Stian Soiland-Reyes (University of Manchester), Claus Weiland (Senckenberg Gesellschaft für Naturforschung)
    • 13:00 13:15
      deRSE25 - Welcome 15m Audimax A+B

      Audimax A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Speaker: René Caspart (Karlsruhe Institute of Technology (KIT))
    • 13:15 14:15
      Keynote: Research Software and Its Developers: Insights Gained and Future Directions 1h Audimax A+B

      Audimax A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe

      Simulation software packages are fundamental for advancing modern scientific
      research. These tools vary widely in scale, from a few thousand lines of code
      to millions, demanding significant human expertise and computational resources
      for their development and long-term maintenance. Yet, despite this critical
      role, both the developers and the process of scientific software development
      are often underappreciated in academic settings.

      In this talk, I will explore aspects and challenges specific to scientific
      code development within academia, aiming to stimulate a broader discussion
      around several core questions: Is the current academic model the most
      effective way to develop research software? How much should society invest in
      building and maintaining research software? What strategies can we adopt to
      ensure that scientific code is robust and reliable? What unique questions and
      obstacles do scientific software development projects encounter? Furthermore,
      how can we ensure the long-term accessibility, maintenance, and
      reproducibility of simulation software and simulation results?

      Drawing on two decades of experience as a co-developer and co-manager of the
      DFTB+ quantum mechanical atomistic simulation software, and as an active
      member of the Fortran developer community, I will share some lessons learned
      so far and offer some views on the future of research software engineering.

      Speaker: Dr Bálint Aradi (University of Bremen)
    • 14:15 14:30
      Coffee Break 15m Audimax Foyer

      Audimax Foyer

      Building 30.95

    • 14:30 15:30
      Facets of large Software Infrastructures Room 206 (Building 30.70)

      Room 206

      Building 30.70

      Straße am Forum 6, 76131
      Convener: Inga Ulusoy (University of Heidelberg)
      • 14:30
        Advantages and Challenges of a Gitlab CI/CD Pipeline Architecture for the Build and Release System of a Multi-Project Satellite Simulation Software 20m

        The aim of this presentation is to demonstrate the benefits and constraints of using a Continuous Integration and Continuous Development (CI/CD) for the testing, documentation, build and release of numerous Gitlab projects (science modules) as well as the desktop GUI application for the "Modelling software for quantum sensors in space" (MoQSpace) project at DLR (German Aerospace Centre) Institute of Satellite Geodesy and Inertial Sensing. The objective of MoQSpace is to provide a satellite simulation toolchain for the advanced orbital propagation and test mass dynamics of novel sensors with a software library of science modules developed in various programming languages with many contributions from different authors over the past two decades.

        With such diverse and large legacy research code, it was crucial to plan standardized workflows to develop, test, document and publish modules with a module versioning and dependency management system. Consequently, the modules were grouped in packages such that the package constituents are inter-compatible for a given development environment. The defined structure and development processes accelerate the development of new additions to the library. The users can view which modules can be selected and downloaded together in a desktop GUI application called VENQS that is developed in Python. The CI/CD in Gitlab is a powerful tool for the aforementioned facets of the project but involves several challenges.

        Firstly, each module is a separate Gitlab project with parallel releases in multiple software versions and OS versions. It was essential to automate the process with build scripts at a single source of truth for traceability and agile development. Hence, a separate project in the parent Gitlab Group was created with modular CI/CD scripts that are imported to all the modules. But the re-usability of the code for parallel jobs for varying metadata variables is limited to the ‘parallel’ keyword that cannot be mapped one-on-one to the corresponding jobs in the next stage in the CI/CD pipeline with the existing methods.

        Secondly, two release architectures were needed. In the down-to-top scenario, modules are built and released individually. When a new package of modules is published in the VENQS App, it fetches the published releases from the Package Registry of each module. In the top-to-down approach, for an existing package, all modules need to be re-built in a new version, e.g. for a new OS system. This requires one parent CI/CD script that triggers all the module pipelines to re-run to add a release for an existing package in the Package registry. The existing tools in Gitlab can achieve this but require a well-planned workflow to optimize the traceability and parallel pipelines across multiple Gitlab projects based on the project requirements.
        Concluding, an objective in this talk is to shed light on the multi-faceted benefits we achieved in automating the build and release of research software in Gitlab CI/CD. Moreover, the lessons learned with the challenges in the design and implementation will be discussed with the hope to generate a fruitful discussion.

        Speaker: Suditi Chand (DLR - Institute for Satellite Geodesy and Inertial Sensing)
      • 14:50
        Software Infrastructure for fully Containerized Computing Cluster at GSI / FAIR 20m

        Scientific research at the FAIR accelerator facility spans a wide range of fields, including Nuclear Physics, Atomic Physics, and Heavy Ion Physics. Workflows for simulations and data analysis in FAIR experiments range from High Throughput Computing to OpenMPI calculations and traditional batch processing. Operating a shared computing cluster that is scalable enough to meet the diverse needs of such a heterogeneous user community presents a significant challenge.
        We present the principles of a fully containerized approach used in the GSI/FAIR computing cluster, which has been successfully operating for five years. This approach is based on a complete separation of user application environments from the host system, providing greater flexibility and scalability.
        This new approach involves additional software infrastructure to support a reproducible containerized environment and enable reliable testing. To enhance the user experience in working with containers, we introduced the concept of a Virtual Application Environment, allowing users to interactively log into the container with access to an extensive software stack.

        Speaker: Dmytro Kresan (GSI, Darmstadt)
      • 15:10
        Three Lessons Learned: How RSEs Succeed in License Management 20m

        Software license management is a critical but often overlooked aspect of Research Software Engineering (RSE). For both open-source and proprietary software projects, proper license management is increasingly important for sustainability, compliance, and collaboration. Our talk presents three key lessons learned from our experiences in license management, based on interdisciplinary projects and case studies at KIT. These lessons should help RSEs to overcome the challenges of license compliance in academic and industrial environments and to ensure long-term software value.

        1. Generate Software Bill of Materials (SBOM) for Transparency
        A key takeaway is the importance of creating and maintaining a Software Bill of Materials (SBOM) early on from the start in any RSE project. An SBOM provides a comprehensive inventory of all components and their associated licenses. It ensures transparency by clarifying which licenses apply to which parts of the code, and is especially valuable when collaborating with industry. In one case, a partner required software that had to be compliant with industry standards (e.g. ISO5230). The team had to do a lot of retrospective work to meet these requirements, highlighting the need for an SBOM from the beginning to avoid legal and financial complications later.

        2. Carefully Manage Contributions
        Using version management in your DevOps platform (e.g. GitLab) is essential for Research Software projects, among others to track the development process, coordinate between participating developers, and provide access to current and previous versions of the project. Within such a structured approach, it’s also important to take care of license management, especially for handling incoming contributions.
        Effective license management must extend beyond outbound licensing to the contributions RSEs accept from others. Inbound contributions must align with the project’s outbound licensing strategy. For example, third-party contributions may introduce incompatible licenses, which can disrupt a project’s legal position. This lesson emphasizes the need to carefully evaluate all external code to avoid issues like improperly licensed "snippets" from public forums. Tools like Fossology and REUSE help streamline this process by checking for license compliance, ensuring that all contributions are consistent with the project's overall license model.

        3. Maintain your Flexibility to Adapt Your License Model to the Community
        The third lesson is to remain adaptable in your licensing decisions. Different communities and industries may require different licensing strategies. In one project, an RSE team had to deal with dual licensing issues when an industry partner requested a non-copyleft version of RSE’s “GPL-ed“ software. By adapting their licensing model, they were able to serve both the open-source community and the proprietary software market. Such flexibility can extend the reach and value of the software, allowing RSEs to balance community engagement with commercialization opportunities.

        Conclusion
        By integrating these three lessons—generating SBOMs, carefully managing contributions, and maintaining flexible with licensing RSEs can navigate the complexities of license compliance. These strategies not only improve the sustainability of research software, but also open doors for broader collaboration and industry adoption. Our presentation will provide real-world examples, tools, and techniques to help RSEs master license management for long-term project success.

        Speakers: Drees T., Feuchter D., Stary T., Winandi A.
    • 14:30 15:30
      Implement an automated release and publication workflow for your GitLab software repository using FACILE-RS 1h Seminarroom 104 (Building 30.96)

      Seminarroom 104

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      Research software development is a fundamental aspect of modern academic research, and it has now been acknowledged that the FAIR (Findable, Accessible, Interoperable, Reproducible) principles, historically established for research data, should also be applied to research software.
      As software is by nature executable and evolving over time, the FAIR principles had to be adapted to this particular type of digital assets, and the FAIR principles for Research Software (FAIR4RS) have been introduced in 2021.
      It can be challenging for software developers to adopt the FAIR4RS principles, as it requires for example to archive every software release on a persistent data repository, associated with relevant metadata, which can be time-consuming if done manually.

      In this context, the Python package FACILE-RS simplifies the maintenance of software metadata by automating its generation and synchronization in various formats, from a single manually maintained CodeMeta metadata file. It also provides automated pipelines for releasing software on GitLab as well as publishing on the persistent research data repositories RADAR and Zenodo.

      In this tutorial, you will use FACILE-RS in your own GitLab software repository (or using our template repository) to automate the creation and synchronization of metadata files for your software, and to implement a semi-automated release pipeline for creating releases on GitLab and Zenodo.

      In practice, during this tutorial, you will:
      - Create a CodeMeta metadata file for your software.
      - Implement a GitLab CI/CD (Continuous Integration/Continuous Delivery) pipeline to generate and synchronize a CFF file and a DataCite metadata record automatically from this CodeMeta file.
      - Implement a GitLab CI/CD pipeline for creating releases of your software on GitLab and Zenodo, associated with a persistent identifier. You will then be able to trigger releases of your software just by creating a specific tag in your repository.

      Speaker: Marie Houillon (Karlsruhe Institute of Technology)
    • 14:30 15:30
      Metadata in Research Software SR A+B

      SR A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Joerg Schaarschmidt (Karlsruher Institute of Technology)
      • 14:30
        Metadata-Annotated Modelling with FAME: An Open Electricity Market Model Example 20m

        Abstract

        The purpose of the open Framework for distributed Agent-based Modelling of Energy systems FAME is to support the rapid development and fast execution of complex agent-based energy system simulations. With upcoming releases of its main components FAME-Io and FAME-Core, full support for metadata annotation is achieved. This enables modellers to add and track metadata not only to data associated with model runs, but also to model components and model outputs. To support data integrity, a single binary file encapsulates all model inputs, outputs, and their associated metadata. Using FAME-Io, this binary file can be extracted for further processing, resulting in tabular files in CSV format and a single accompanying JSON metadata file. The structure of the metadata file follows the Open Energy Platform’s (OEP) metadata schema. We also provide a full-scale example model annotation with metadata using the open Agent-based Market model for the Investigation of Renewable and Integrated energy Systems AMIRIS.

        Model Metadata

        In FAME, model descriptions are specified in YAML files. Such files describe each type of agent that can be part of a model. For each agent type, required inputs, capabilities, and outputs are listed. This allows FAME to validate input data provided by model users before running a simulation. Metadata can now be added to any aspect of agent type descriptions, thus fostering a better documentation of models. For example, in the input section of an agent type, attached metadata could explain a parameter’s application or what unit is expected for connected data. Metadata annotations on outputs directly feed the JSON file that accompanies the model results. Currently, no specific form of metadata is enforced, but it is recommended to follow the OEP metadata schema.

        Input Metadata

        To start a simulation, each instance of an agent must receive its input parameters. Again, FAME uses YAML files to parameterise simulations, and again, each input parameter can be annotated with metadata. In this way, all data used in a simulation can be accurately described. FAME takes all the above data and metadata and stores it in a single binary file. Once a simulation is run, all output data is added to this file along with additional metadata describing the simulation process. This includes, for example, versions of used FAME tools, wall-time information, or processor configurations. All data and metadata of simulations can be reconstructed from these FAME’s binary files.

        Metadata Example

        We provide an example of metadata annotation with FAME for the electricity market model AMIRIS in the repository AMIRIS-Examples. In order to enable automatic matching of input parameter types, agent parameters have been mapped to their closest concept in the Open Energy Ontology. We aim to extend our current set of metadata and provide a complete set of relevant metadata for each agent type and input parameter with future releases of AMIRIS-Examples.

        Speaker: Christoph Schimeczek (Deutsches Zentrum für Luft- und Raumfahrt e.V.)
      • 14:50
        Data Model Creation with MetaConfigurator 20m

        In both research and industry, significant effort is devoted to the creation of standardized data models that ensure data adheres to a specific structures, enabling the development and use of common tools. These models (also called schemas) enable data validation and facilitate collaboration by making data interoperable across various systems. Tools can assist in the creation and maintenance of data models. We introduce MetaConfigurator [1], an open-source web-based schema editor and form generator for JSON schema and for JSON/YAML documents. It differs from other schema-to-UI approaches in the following ways:
        1) It allows data editing and schema editing within the same tool,
        2) It offers a unified view, which combines the benefits of a GUI, a text editor and a UML-like diagram view, and
        3) It supports advanced schema features, including conditions, constraints and composition.

        In this talk, we demonstrate MetaConfigurator based on a real-world application in the field of Chemistry. We show how the tool can be used to streamline and simplify the process of data model creation. Furthermore, we use the tool to visualize and communicate data models with others. Using MetaConfigurator less mistakes are made and the entry barrier for data model creation is lowered.

        Schema Editor Screenshot
        Fig 1: Excerpt of the Schema Editor, with the raw schema text editor view on the left, the interactive diagram view in the middle and the GUI view on the right.

        [1]: Neubauer, Felix & Bredl, Paul & Xu, Minye & Patel, Keyuriben & Pleiss, Juergen & Uekermann, Benjamin. (2024). MetaConfigurator: A User-Friendly Tool for Editing Structured Data Files. Datenbank-Spektrum. 24. 10.1007/s13222-024-00472-7.

        Speaker: Felix Neubauer (University of Stuttgart)
      • 15:10
        SMECS: A Software Metadata Extraction and Curation Software 20m

        Metadata have shown to be one of the success factors for the so-called FAIRification of research software, especially in improving the findability and reusability of research software [1], [2]. Creating high-quality metadata can be resource-intensive [3]. Moreover, users often find it challenging to utilize metadata effectively for retrieval [4], [5]. To support researchers from various domains in creating metadata for their research software, we developed the Software Metadata Extraction and Curation Software (SMECS).

        SMECS lowers the barrier to create metadata for research software by combining extraction from existing sources with easy-to-use curation of metadata. SMECS is a python-based software that is available open-source . As a first step, SMECS can extract existing metadata from different sources. Afterwards, the tool allows the researcher to curate the metadata and add additional information. Finally, SMECS provides the researchers the metadata as a JSON file in the CodeMeta format which is a common standard for research software metadata [6].

        For research software, a lot of metadata is already presented in online repositories, e.g., on GitHub or GitLab. Sometimes, even structured metadata is available in CFF [7] or CodeMeta files. Therefore, after the user provides a link to a repository, SMECS extracts as much metadata as possible from the API of the corresponding repository, e.g., name, contributors. All extracted metadata is mapped to CodeMeta.
        In the second step, SMECS presents the researcher the extracted metadata in a user interface to allow further curation by the researcher. The researcher can check and change the extracted metadata as well as add additional metadata. Finally, the researcher can export the metadata as a CodeMeta file.

        SMECS was designed with an emphasis on User-Centered Design (UCD), ensuring that the needs, preferences, and behaviors of users were prioritized throughout the development process. The objective of User-Centered Design for SMECS is not only to develop useful metadata extraction and curation but also to enhance user satisfaction and task performance within SMECS. To strengthen the overall usability and better meet user needs, SMECS was improved based on feedback from usability experiments during the iterative design process. In the first usability experiment, participants interacted with the software by completing a series of predefined tasks. Following this, each participant filled out the System Usability Scale (SUS) questionnaire to assess their experience with the tool. Finally, semi-structured interviews were conducted with each participant to gather qualitative data regarding their user experience with SMECS, providing deeper insights into their interactions and perceptions of the tool's usability.

        Our results reveal that, SMECS includes a good user experience. To simplify the metadata creation for researchers even further, we plan to expand SMECS’ capability to extract metadata from a wider range of sources (e.g., CFF and CodeMeta files including functionalities of HERMES [8], and README files by including functionalities of SOMEF [9]).
        In our talk, we will present the current state of SMECS and discuss SMECS with the audience.

        References
        [1] https://doi.org/10.1016/j.patter.2021.100222.
        [2] https://doi.org/10.3233/DS-190026.
        [3] https://doi.org/10.1002/meet.2009.1450460397.
        [4] https://doi.org/10.1177/0165551513507405.
        [5] https://doi.org/10.1016/j.lisr.2005.01.012.
        [6] http://ssi1.eprints-hosting.org/id/eprint/2/
        [7] https://citation-file-format.github.io/
        [8] http://arxiv.org/abs/2201.09015
        [9] https://doi.org/10.1109/BigData47090.2019.9006447

        Speaker: Stephan Alexander Ferenz (Carl von Ossietzky Universität Oldenburg; OFFIS)
    • 14:30 15:30
      Nation-wide networks of RSEs Audimax A+B

      Audimax A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Stephan Janosch (MPI-CBG)
      • 14:30
        Supporting Community Building through regular Knowledge Exchange Workshops 20m

        Regular community events are important to bring and keep relevant people of the community together. At the German Aerospace Center (DLR), this aspect is particularly important because the DLR RSE community members are distributed across many locations and different organizational units. For that reason, we perform knowledge exchange workshops on a yearly basis. These events are usually organized as in person meetings and allow the DLR RSE community to come together, to exchange new ideas and to develop a sense of belonging.

        In this talk, we introduce the concept of knowledge exchange workshops and reflect upon our ten years of experience of organizing these workshops at DLR. In addition, we present the initial findings of our evaluation about the effects on the DLR RSE community in which we surveyed the former participants of these workshops. The results show that the knowledge exchange workshops help to increase the sense of community and the number of community interaction. Finally, we want to share our lessons learnt organizing such community events with the broader RSE community.

        Speaker: Tobias Schlauch (DLR)
      • 14:50
        Updates on building a swiss wide RSE community 20m

        We started end of 2023 with building an RSE community at ETH Zurich. After receiving some funding, we and oterhs are have been starting similar activities at other Swiss research institutions. The presentation will give an overview of our activities so far and lessons learned. We will present our ideas for the future of a Swiss wide RSE community.

        Speaker: Uwe Schmitt (Scientific IT Services ETH ZUrich)
      • 15:10
        Research Squirrel Engineers: How an independent RSE-driven network may help the NFDI 20m

        The comprehensible/collaborative creation and FAIRification of research data is becoming increasingly important in the Citizen Science community to become part of an interdisciplinary knowledge graph and enrich the already interconnected data network with qualified data. Only in this way can this data be linked to other data and actively integrated into international initiatives (e.g. NFDI) and community hubs (e.g. Wikidata, FactGrid, Semantic Kompakkt, OpenStreetMap). Unfortunately, open-source (FOSS) research and FAIRification tools are often unavailable. However, these, in combination with Linked Open Data projects as demonstrators, can be created and curated by community and voluntary initiatives such as the Research Squirrel Engineers Network.

        This paper presents the Research Squirrel Engineers Network initiative, three research and FAIRification tools, and three Research Squirrels projects, as well as how Research Software Engineering may help make it even more helpful for the NFDI. These can serve as digital services for digital data management in archaeology and so be part of substantial interdisciplinary initiatives such as the NFDI. The paper, therefore, also presents the aims, benefits and implementation of the squirrel tools.

        The Research Squirrel Engineers Network (founded in 2019 to implement the SPARQL Unicorn) is a loose association of Linked Open Data/Wikidata enthusiasts, Research Software Engineers and Citizen Scientists focusing on computational archaeology, digital humanities and geoinformatics. The members develop and maintain research and FAIRification tools and implement them in concrete projects.

        A FAIRification tool for digital data management is the SPARQL Unicorn and its implementation for QGIS. The "SPARQLing Unicorn QGIS Plugin" allows sending linked data queries in (Geo)SPARQL to triple stores and prepares the results for the geo-community in QGIS. It currently offers three main functions: (A) Simplified querying of Semantic Web data sources, (B) Enrichment of geodata, and (C) Transformation of QGIS vector layers to RDF. In addition, the SPARQL Unicorn Ontology Documentation Tool enables the automated creation of HTML pages of Linked Open Data publications, e.g. via GitHub Action. One example are Irish Ogham sites on the Dingle Peninsula or data from Sophie C. Schmidt's dissertation project on "Brandenburg 5,000 BC" by converting a CIDOC CRM data model into Linked Open Data and visualising it as HTML with the help of SPARQL Unicorn.

        Another service is the "Fuzzy Spatial Locations Ontology", in which georeferencing's vagueness, uncertainties and ambiguities are made FAIR and comprehensible with the help of semantics and an ontology (based on PROV-O). An example of this is the modelling of sites of the eruption of the Campanian Ignimbrite in the Phlegraean Fields (39,940 yr b2k ± 150 years), which often correspond to archaeological sites, e.g. the Toplitsa Cave in Bulgaria. The "Squirrel Papers" complement the services to create a platform for publishing working papers, data, software, presentation slides and posters for citation.

        These services are accompanied by LOD / Wikidata / Open Street Map and Wikimedia Commons projects, such as Linked Open Ogham, Holy Wells in Ireland, or Linked Reindeers, where Scripts (primarily written in Python) help transform the tabular data into RDF or Quick Statements.

        Speaker: Florian Thiery (Research Squirrel Engineers Network)
    • 14:30 15:30
      the teachingRSE project working meeting 1h Seminarroom 006 (30.96)

      Seminarroom 006

      30.96

      Straße am Forum 3, 76131 Karlsruhe

      the teachingRSE project is community of interest formed around the idea of finding structures for the most effective education of new and developing RSEs in an academic landscape
      In this working group meeting we plan to work on our forthcoming publication on detailing how to create structures for the future training of both new and practising RSEs.

      Speakers: Florian Goth (Universität Würzburg), Frank Löffler (Friedrich-Schiller-Universität Jena), Jan Philipp Thiele (Weierstrass Institute Berlin), Dr Jeremy Cohen (Imperial College London)
    • 15:30 16:00
      Coffee Break 30m Audimax Foyer

      Audimax Foyer

      Building 30.95

    • 16:00 17:00
      An introduction to Machine-actionable Software Management Plans 1h Seminarroom 104 (Building 30.96)

      Seminarroom 104

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      The concept of software management plans (SMPs) is similar to Data management plans (DMPs) but focusing on the research software lifecycle aligned to the FAIR for research software (FAIR4RS). DMPs consist of a series of questions and answers to outline how data will be handled during and after a research project. Similarly, an SMP helps us outline some important elements to handle and share our research software, resulting in the production of more reproducible and reusable software. An SMP questionnaire covers research and technical information including, for instance, aspects about licenses, releases, and public availability. A machine-actionability layer can be added to SMPs turning them into maSMPs. This refers to a semantically structured description (i.e., metadata) of the research software and its lifecycle.

      The ELIXIR SMP, proposed by the Software Best Practices focus group in ELIXIR Europe, aims at a low barrier entrance so both research software engineers and researchers who code ban benefit from it. We have collaborated with them to add a machine-actionable layer based on schema.org, and thus compatible with Codemeta. We provide types and profiles (i.e., usage recommendations on top of schema.org and our own types) to describe SMP, including source code and releases. We have aligned our maSMP to the SMP created by the eScience Center in the Netherlands and the one created by the Max Planck Digital Libraries. We have also analyzed its compatibility wrt the Research Software Metadata guidelines proposed by EOSC.

      In this short tutorial, we will briefly introduce the FAIR4RS principles and discuss how they relate to software metadata. We will then show some sources of software metadata and practical steps to support FAIR4RS. Afterwards, we will introduce SMPs including differences wrt project planning and project management. We will then present our approach to maSMPs, including a practical approach to get the corresponding metadata.

      Resources
      - FAIR4RS https://doi.org/10.15497/RDA00068 and https://doi.org/10.1038/s41597-022-01710-x
      - ELIXIR SMP https://doi.org/10.37044/osf.io/k8znb
      - maSMP metadata schema https://doi.org/10.5281/zenodo.7806638
      - maSMP profiles https://doi.org/10.5281/zenodo.10582120
      - An example of maSMP in action https://doi.org/10.37044/osf.io/t94g8
      - maSMP project pages https://zbmed-semtec.github.io/maSMPs/

      Funding
      The first version of the maSMP was funded by the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 101017536, part of the Research Data Alliance and European Open Science Cloud Future call 2022.
      The alignment to the EOSC RSMD guidelines was part of the FAIR-Impact support action calls funded by the European Commission, grant “FAIR-IMPACT – Expanding FAIR Solutions across EOSC” number 101057344.
      The maSMP project is part of NFDI4DataScience consortium funded by the German Research Foundation (DFG), project number 460234259.

      Speaker: Leyla Jael Castro (ZB MED Information Centre for Life Sciences)
    • 16:00 17:00
      Breaking the Chat Barrier: A Workshop on Dynamic LLM Interfaces 1h Seminarroom 006 (Building 30.96)

      Seminarroom 006

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      Large Language Models (LLMs) have revolutionized the field of artificial intelligence, offering numerous new applications in natural language processing, such as text generation, translation, sentiment analysis and conversational interfaces. Early studies show that LLMs have not only been utilized in everyday life but have also found their way in daily work of researchers, for example, in assisting with writing code and data analysis [1,2]. This integration in the daily work of researchers has boosted workflows by outsourcing mundane tasks, allowing scientists to focus on complex problem-solving and creative endeavors.

      However, the conversational interfaces widely used to interact with these models come with inherent limitations in context management. The design of chat-based interfaces, with their linear and chronological structure, constrains users' ability to refine, reorder, or selectively manage the interaction history. This often leads to bloated contexts and diminished response quality, especially in complex scenarios where LLMs are used for tasks like programming, iterative analysis, or text revision that require more nuanced and dynamic interactions.

      With support from AI-Hub@LMU, we aim to develop a novel, user-friendly interface for Large Language Models that overcomes the limitations of traditional chat-based systems. Our goal is to create a dynamic user interface that empowers users with fine-grained context management, providing greater flexibility and control over structuring the LLM’s input during interactions. Our development process follows the User-Centered Design (UCD) methodology, in accordance with DIN EN ISO 9241-210, encompassing four phases: understanding the context of use, determining user requirements, drafting design solutions, and evaluating these solutions.

      In this workshop, we want to gather previous experiences of using LLMs with the participants, particularly in the context of scientific work, and to identify the challenges and barriers they face. Based on these insights, we identify collaboratively with the attendees requirements for a dynamic chat interface and discuss different visualization concepts. The workshop will include exercises such as persona creation and journey mapping to better understand user needs and explore different design solutions.

      The outcomes of this workshop will directly inform the development of the dynamic chat interface, which we plan to release as open-source software. As this project is still in its early stages, the workshop will focus primarily on conceptual discussions rather than hands-on coding. We welcome all deRSE participants, regardless of prior programming experience or familiarity with LLMs.

      This workshop aligns with the deRSE25 topic of “AI and ML in research contexts”, by addressing the intersection of AI and usability in research software, fostering collaborative engagement to gather insights on how to enhance the practical integration of LLMs in the daily work of researchers.

      [1] Nejjar, M., Zacharias, L., Stiehle, F., & Weber, I. (2023). Llms for science: Usage for code generation and data analysis. Journal of Software: Evolution and Process, e2723. https://doi.org/10.1002/smr.2723
      [2] Le, F. (2023). How ChatGPT is transforming the postdoc experience. Nature, 622, 655.https://doi.org/10.1038/d41586-023-03235-8

      Speakers: Mr Maximilian Frank (LMU), Mr Simon Lund
    • 16:00 17:00
      Domain Specific Languages Room 206 (Building 30.70)

      Room 206

      Building 30.70

      Straße am Forum 6, 76131
      Convener: Jan Philipp Thiele (Weierstrass Institute Berlin)
      • 16:00
        Test-Driven Software Experimentation with LASSO 20m

        The field of empirical software engineering faces a significant gap in standardized tools for conducting rapid and efficient Test-Driven Software Experiments (TDSEs). These experiments involve executing software subjects and observing their runtime behavior (i.e., dynamic program analysis). To address this gap, I present LASSO, a general-purpose software code analysis platform that provides a minimal set of domain-specific languages and data structures to conduct TDSEs [1]. Inspired by the architectural designs of modern big data systems, LASSO is a scalable (distributed) workflow system for software engineering research based on the data-driven programming paradigm.

        LASSO empowers users with an executable scripting language, allowing them to design and execute complex workflows efficiently. Unlike traditional ad-hoc approaches, LASSO offers a unified platform for creating automated and reproducible TDSEs at scale, while fostering Open Science principles (note that LASSO also offers additional, specific services like code recommendation).

        Talk Overview:

        My talk will showcase the practical benefits of using LASSO in evaluating software reliability for a particular software engineering scenario. I will present an example use case demonstrating how LASSO's domain-specific scripting language, LSL, seamlessly translates study designs into executable scripts that capture essential analysis steps and parameters. This reproducible example highlights the platform's capabilities in empowering users to quickly develop complex workflows.

        Through this talk, I aim to demonstrate how LASSO can be leveraged as a research software platform for various applications, including evaluating code generation tasks [2]. I will discuss the key features and data structures within the LASSO platform, highlighting opportunities for customization and extension to meet specific needs. Additionally, I will point out core challenges faced in the platform’s development.

        By providing researchers with a unified platform for TDSEs, LASSO has the potential to significantly enhance the field of empirical software engineering. Its impact will not be limited to researchers, but also benefit practitioners (e.g., facilitating tool evaluations) and educators (e.g., test-driven assessments in programming courses). The LASSO platform is freely available at https://softwareobservatorium.github.io/ , and a demo video is available on YouTube: https://youtu.be/tzY9oNTWXzw . Looking ahead, because of the open source nature of LASSO, we anticipate a growing community which leads to a thriving ecosystem around TDSEs, characterized by shared repositories of experiments among users.

        Young RSE Prize:

        As the LASSO platform was developed as part of my dissertation, which I successfully defended with “summa cum laude” honors in February 2023, I believe that this talk on LASSO as a research software platform aligns perfectly with the “Young RSE Prize”.

        Literature:

        [1] Marcus Kessel, Colin Atkinson, Promoting open science in test-driven software experiments, Journal of Systems and Software, Volume 212, 2024, 111971, ISSN 0164-1212, https://doi.org/10.1016/j.jss.2024.111971.

        [2] M. Kessel and C. Atkinson, "N-Version Assessment and Enhancement of Generative AI," in IEEE Software, doi: 10.1109/MS.2024.3469388, Preprint: https://arxiv.org/abs/2409.14071

        [3] Marcus Kessel,LASSO – an observatorium for the dynamic selection, analysis and comparison of software, Dissertation, 2023, https://madoc.bib.uni-mannheim.de/64107/

        Speaker: Marcus Kessel (University of Mannheim)
      • 16:20
        SUS: A new language for efficient Hardware Design 20m

        SUS is a new HDL under development at the Paderborn Center for Parallel Computing. At its core, SUS is an RTL language intended to be used side-by-side with existing SystemVerilog and VHDL codebases. SUS has many interesting features, ranging from compile-time metaprogramming, to IDE information about clock domains and pipelining depths and metaprogramming debugging. Though this talk will mostly focus on the Latency Counting system of SUS. Latency Counting is SUS' approach to pipelining. People scoff at manual pipelining, but it is key to squeezing out the last bits of performance from resource-constrained hardware. Latency Counting relieves the mental burden of pipelining however, as it allows only local pipelining adjustments, and the compiler will - through the type system - adjust the surrounding hardware to handle the change.

        Speaker: Lennart Van Hirtum (Universität Paderborn)
      • 16:40
        OpenLB : On the Software Architecture of an Efficient and Flexible Lattice Boltzmann Method Framework 20m

        OpenLB is one of the leading open source software projects for Lattice Boltzmann Method (LBM) based simulations in computational fluid dynamics and beyond. Developed since 2007 by an international and interdisciplinary community, it not only provides a flexible framework for implementing novel LBM schemes but also contains a large collection of academic and advanced engineering examples. It runs efficiently on target platforms ranging from smartphones over multi-GPU workstations up to supercomputers.

        This talk will give an overview of the current software architecture of OpenLB with a special focus on automatic code generation and performance engineering. Recent performance benchmarks and large-scale applications will be showcased.

        Specifically, the utilization of long-time investments in ensuring the full differentiability via automatic differentiation (AD) for automatic common subexpression elimination (CSE), user-friendly model introspection and generation of adjoint simulation setups will be discussed. The talk can be viewed as a direct continuation of the author's talk at deRSE23 on the refactoring journey towards state-of-the-art-performance.

        Speaker: Adrian Kummerlander (KIT)
    • 16:00 17:00
      Education of RSEs SR A+B

      SR A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Stephan Janosch (MPI-CBG)
      • 16:00
        From Design to Delivery: Building Better Online RSE Workshops 20m

        How do you build online workshops that are engaging throughout the event, accessible to both novices and experts, and effective in helping students apply tools to their work? This talk introduces a proven approach to teaching Research Software Engineering (RSE) through learner-centered methodologies. Our workshops, designed around 90-minute teaching units, use concise lectures, small social group activities, and live coding reviews to create a dynamic and interactive learning environment. By focusing on applied, novice-friendly topics, we ensure accessibility for diverse participants.

        Delivered through the iBOTS platform, these workshops achieve exceptional results: a 90% completion rate, five new workshops annually, and consistent positive feedback. Participants benefit from a focus on communication and collaboration, fostering a sense of community that supports long-term learning. This success reflects the application of active learning principles and strategies that address the unique challenges of online teaching.

        In this talk, we will outline the practical steps of designing, organizing, and delivering successful workshops. Topics include curriculum development, accessibility enhancements, and techniques for encouraging engagement in virtual spaces. Attendees will gain actionable strategies for improving their own online teaching, ensuring workshops that are not only effective but also enjoyable for learners.

        Speaker: Nicholas Del Grosso (iBehave Open Technology Support Group, Uni-Bonn)
      • 16:20
        The SSC fellowship program - Personalized support and RSE education for young researchers 20m

        Young researchers often are highly dependent on research software for their work. While some software skills are a basic
        necessity in many, if not most, scientific fields today, the skill set so acquired is usually not sufficient to effectively develop, maintain, and design the larger software projects upon which much of modern scientific collaborative work is built.
        This deficit not only influences their own scientific output negatively but also leaves a gap in their ability to teach future students and limits their scientific flexibility and opportunities for collaboration.

        To help address these issues, the SSC Fellowship Programme was established at the Scientific Software Centre (SSC) at the University of Heidelberg in Summer 2024. It complements the SSC's activities of providing software development services, consultations and courses with a dedicated support programme for young researchers at the PhD or Postdoc level at Heidelberg University.
        Following a competitive selection process, SSC fellows are allocated a mentor from the SSC research software engineers, and are provided with regular one-to-one support for the research software aspects of their project, encompassing the full spectrum of research software engineering, from support on coding to algorithm selection, licensing and software architecture.
        In addition to the benefits for the mentees, the fellowship programme provides mentors with the opportunity to gain more in-depth knowledge about software demands and development practices in a particular scientific field, as well as to expand their professional horizons and learn about new tools, techniques, packages or teaching.

        In the proposed talk, I will give an overview of the concept, goals, and scope of the SSC fellowship programme. Secondly, I will discuss our experiences so far, the challenges this programme presents for us as well as potential improvements. The presentation will be multifaceted, presenting the benefits and challenges for mentees, mentors, and our group as a whole. By sharing our experiences with the SSC fellowship programme, we hope to inspire further discussion on RSE outreach and education projects and how to improve and maintain them.

        Speaker: Dr Harald Mack (Interdisciplinary Center of scientific computing, Heidelberg University)
      • 16:40
        CFF and JSON for More Impactful RSE Training Materials 20m

        The RSE community has created and contributed to many high quality, Open Source training materials (Code Refinery, HiDA, The Carpentries Incubator, UNIVERSE-HPC, etc). Taken in isolation, these are valuable resources. But learners and project contributors would benefit from increased findability and interoperability of individual lessons and curricula.
        Simultaneously, it remains a challenge to ensure that contributions to Open Source lesson materials are acknowledged and visible, especially to an audience who may be unfamiliar with the terminology and infrastructure of Open Source projects.
        This talk will explore how Citation File Format and other standards are being used to describe Open Source RSE training materials, with particular focus on The Carpentries Workbench. We will identify opportunities for the RSE community to collaborate on the description of learning pathways between resources from different projects. And we will discuss how such efforts could help to close gaps in the training available to help RSEs develop the competencies they need to succeed.

        Speaker: Toby Hodges (The Carpentries)
    • 16:00 17:00
      The role of RSE(s) Audimax A+B

      Audimax A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Inga Ulusoy (University of Heidelberg)
      • 16:00
        Research software phases, sustainability, and RSE types 20m

        Research software projects often are initially funded by a grant that supports development of the software. But when the grant ends, the projects have to shift to another model to support the required software maintenance, if the software is going to continue being used. This talk will look at the Parsl project and its effort to become sustainable, across a set of project phases. It will also look at the different kinds of RSE work that have taken place during the project. These activities, phases, and developer types appear to be useful concepts for planning or studying other research software projects, or research software as a whole. The talk will be partly aimed at finding others who want to collaborate on understanding how general these results are, and how much they can benefit other projects.

        Speaker: Daniel S. Katz
      • 16:20
        Research Software: A Critical Ingredient Across Diverse Funding Models and Disciplines 20m

        Research software engineering (RSE) plays a pivotal role in advancing science, yet its integration and funding vary significantly across different research programs and disciplines. In this talk, we will explore how RSE expertise has been crucial in three distinct contexts, revealing the diverse ways funding structures support—or overlook—the critical need for sustainable software development.

        Graduate College (GRK 1422 - Metal Sites in Biomolecules: Structures, Regulation and Mechanisms): Graduate schools often focus on training the next generation of researchers but rarely provide direct funding for dedicated RSE support. Through our work with quantum chemistry researchers (qmbench.net), we filled a critical gap by developing custom tools that enabled graduate students to engage deeply with their data and build software skills—a need that could be more sustainably addressed with embedded RSE positions.

        Special Research Area (SFB 1633 - Pushing Electrons with Protons): Large, interdisciplinary programs often have complex technical needs that RSEs are uniquely positioned to address. Our integration of electronic lab notebooks with research data repositories demonstrated how targeted software solutions can optimize workflows and enhance collaboration across subprojects. However, dedicated RSE funding in SFBs remains inconsistent, often relying on temporary, project-based support.

        Research Group (FOR 2064 STRATA - Stratification Analyses of Mythic Narrative Materials and Texts in Ancient Cultures): Smaller, highly specialized research groups face unique challenges, often requiring bespoke software like Hyleme (the smallest plot-carrying unit of a narrative material) for data modeling and input. These groups typically lack the financial flexibility to hire RSEs, even though the impact of tailored software is profound.

        These examples not only highlight the transformative role of RSEs in diverse contexts but also underscore a broader systemic issue: the need for explicit and sustainable RSE funding across all program types. By embedding RSE expertise within these programs from the outset, we can ensure long-term impact, foster innovation, and provide the foundational support researchers need to focus on discovery.

        Speakers: Ms Kristine Schima-Voigt (Niedersächsische Staats- und Universitätsbibliothek Göttingen), Mr Zeki Mustafa Dogan (Niedersächsische Staats- und Universitätsbibliothek Göttingen)
      • 16:40
        Definition and integration of RSE Roles in the context of a modern research organisation 20m

        In a modern research organisation, recognition, career paths and visibility of RSEs depends on the integration into the organisational structure. In this talk we present our apporach at the DLR Institute for Networked Energysystems for integrating RSE roles in a knowledge hierarchy in our institute. We created a role-skill matrix, with different focuses of RSEs, which helps RSEs in identifying required skills to learn and leaders in the organisation conducting job interviews and defining job requirements. The role-skill matrix is accompanied by a description for each role defining the function of each role in the institution. This role-skill matrix is also linekd to available education resources like trainings, books or internal knowledge sources.
        The goal of this role-skill matrix is to achieve transparency, visibility and recognition for RSE activities in our institute and to guide and improve the knowledge transfer, efficiency of research software development and the education around it.

        Speaker: Benjamin Fuchs (Deutsches Zentrum für Luft- und Raumfahrt e.V.)
    • 17:00 17:15
      Short Session Break 15m
    • 17:15 17:55
      Experiences from large tool stacks SR A+B

      SR A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Joerg Schaarschmidt (Karlsruher Institute of Technology)
      • 17:15
        Open-Source toolchain to support building energy systems from digital planning to optimal operation 20m

        Buildings and their energy systems contribute to 16 % of global greenhouse emissions. These emissions may be reduced during planning and operation. However, most buildings are unique: In terms of architecture, used protocols, energy carriers, etc. Thus, deployment of optimal planning and operation does not scale as well as, e.g. the automotive industry. Therefore, both digital planning and cloud based optimal operation is scarce in practice. However, research proves the potential of software-based solutions. To increase the share of optimal, software-based solutions in practice, reducing the upfront cost of development by publishing research findings as open-source software is vital.

        Thus, at the Institute for Energy Efficient Buildings and Indoor Climate, we provide the expertise from various research theses and publicly funded projects in the form of open-source tools. In this work, we present our open-source tool chain from digital planning to optimal operation of building energy systems. Using digital Building Information Models (BIM) data, we generate detailed Modelica simulation models using our Python library bim2sim. Missing data is enriched with typical information provided by TEASER. The models use our open-source model libraries BESMod and AixLib, as well as detailed heat pump models generated by our Python library VCLibPy. To automate, analyze, and optimize the planning stage of building envelope and energy system, ebcpy interfaces with Modelica Simulation tools for fast and parallelized simulations. Once the optimal design is found and build, measurements and simulation may be compared. If measurements deviate from simulation, AixCaliBuHA enables an automated calibration of unknown parameters. The validated models are then used to optimize operation. Herein, the python libraries Agentlib and its plugins (Agentlib-MPC, Agentlib-FIWARE) allow decentralized, cloud-based deployment of model predictive controllers, communicating over an IoT platform, e.g. FIWARE, interfaced by our python library FiliP. Be it with physics based or data-driven, the detailed simulation models generated during design help to develop an efficient and reliant Model Predictive Control.

        This rich and diverse stack of tools are continuously developed by researchers with different level of expertise. To ensure a high code quality and integrity of results, we employ cloud-based Continuous Integration pipelines for unit testing, code quality, documentation, releases, creating Jupyter notebooks, and pull-request management to guide new developers. Future work will focus on improving usability and providing frontends to support application in both research and practice.

        Speaker: Fabian Wuellhorst (RWTH Aachen University, E.ON Energy Research Center, Institute for Energy Efficient Buildings and Indoor Climate)
      • 17:35
        Design decisions for software stacks in experimental research 20m

        Design decisions for research software and IT infrastructure must reflect the unique needs of academia.
        Deviations from conventional best practices may be necessary to meet the requirements of academic work environments and scientific purposes.
        We present our lessons learned and best practice guidelines derived from building a new specialized software environment for a large-scale experimental research facility.

        In our ongoing work of modernizing the IT infrastructure and data flows of the atmosphric simulation chamber facility ‘AIDA’ at KIT we create a single, generalized software stack for multiple atmospheric simulation chambers.
        To create a state-of-the-art research environment we follow the FAIR-RS principles (Findable, Accessible, Interoperable and Reusable Research Software), while further aspects of our design desicions are transparency, reproducibility and the maintainability of the new software stack.

        We have adopted a modular approach to reduce complexity, while preserving flexibility, by separating different stages of the data flow such as data acquisition and data analysis, provision of analyzed data through an API and metadata handling.
        Our open-source toolbox consists of well selected established technologies (Python, pytest, MariaDB, GitLab, CI-pipelines, Sphinx) alongside specialized tools for metadata management (Sensor Management System) and automated testing of time series data (SaQC).

        The combination of a modular architecture, focus on low complexity, and a thorough selection of open-source tools ensures longevity and low cost while providing a flexible and robust IT and software infrastructure to safeguard good research practice of scientists.

        Speaker: Tobias Schorr
    • 17:15 17:55
      Julia - The Language Room 206 (Building 30.70)

      Room 206

      Building 30.70

      Straße am Forum 6, 76131
      Convener: Dr Maria Guadalupe Barrios Sazo (Forschungszentrum Juelich)
      • 17:15
        Automatic Differentiation in Julia with Enzyme 20m

        Automatic Differentiation (AD) is an important technique for both scientific computing and machine learning. AD frameworks from the machine learning world often lack the ability to differentiate programming patterns common in scientific computing, such as mutation and parallelism.

        In my talk, I will cover the AD framework in Enzyme and how it can be used to differentiate scientific codes in Julia. While my talk will focus on Julia, Enzyme is LLVM-based and can also be used to differentiate C/C++/Fortran.

        I will show how one can use Enzyme to differentiate scientific codes in Julia, how to extract Jacobians through directional derivatives and use it to formulate matrix-free methods.

        Speaker: Valentin Churavy (Johannes-Gutenberg Universität Mainz & Universität Augsburg)
      • 17:35
        BinaryBuilder.jl - Robust deployment of binaries to the world 20m

        Reliably deploying binary dependencies to users on various architectures is a non-trivial problem for package authors. More often than not this task is delegated to the user or automated using assumptions that don't always hold.
        The Julia programming language built a tool-chain for robust deployment of binaries and binary dependencies called BinaryBuilder.jl that is useful well outside the world of Julia.
        In this talk you will learn how you can make use of it for your own work.

        Speaker: Simon Christ (Leibniz Universität Hannover)
    • 17:15 18:15
      MAUS - Machine-AUtomated Support for Software Management Plans 1h Seminarroom 104 (Building 30.96)

      Seminarroom 104

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      Data are now recognised as an essential research output. Data Management Plans (DMPs) have therefore become an integral part of research project planning, and are usually required by funding organisations. Research software (ranging from data-specific scripts to standalone software products) plays a crucial role in the reproducibility of scientific results, and, similar to research data, is gaining growing recognition as a research output in itself.

      The creation of research software can be a major project that requires good planning and management. For example, the necessary infrastructure (software dependencies and hardware requirements such as computing power) must be addressed, as well as the human resources needed to develop and maintain the software and write its documentation. Software Management Plans (SMPs) document all these requirements, and thus improve software quality and reusability. This is not only useful for the research group producing the software, but also for those who use or contribute to the software during and after the project runtime. The Research Data Management Organiser (RDMO) is a well-established tool among the research community for creating such DMPs and SMPs. It provides specific question catalogues from the main funding organisations, addressing the important aspects which need to be considered when planning a software project.

      In order to avoid redundancy, the structured information in an SMP should be reusable, for example by being shareable with other tools, information sources and repositories, such as GitHub/GitLab. In our project "Machine-AUtomated Support for Software Management Plans" (MAUS), we plan to create plugins for RDMO that will enable SMPs to be machine-readable and -actionable.

      Our aim is to improve RDMO as a planning tool for research software engineers like you. Therefore, we will start the workshop with a short introduction to SMPs and the MAUS project, giving first proposals for improvements. Then we are eager to hear your comments on this approach, your demands, wishes and ideas, and to discuss with you which features and interfaces we should implement.

      Speakers: David Walter, Laura Bahamón
    • 17:15 18:15
      Meet-Up: RSEs within Leibniz Association 1h Seminarroom 006 (Building 30.96)

      Seminarroom 006

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      After a first successful meeting of RSEs within the Leibniz Association at deRSE24, we would like to use the opportunity again and invite all RSEs which are either working on a Leibniz institute or interested in the work of RSEs in Leibniz to join us and discuss the current state of affairs within Leibniz and how we can further grow and raise awarness as RSEs in Leibniz.
      Before looking into the future plan to recap on what happened during the last year including the Kickoff-Meeting of the "Arbeitsgruppe Softwareentwicklung" in April. We hope for a broad participation of
      Leibniz related RSE and good discussions.

      Speaker: Jan Philipp Thiele (Weierstrass Institute Berlin)
    • 17:15 17:55
      Security of Research Software Audimax A+B

      Audimax A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Florian Goth (Universität Würzburg)
      • 17:15
        Privacy-preserving scientific computing with fully homomorphic encryption 20m

        With the rise of cloud computing in many areas of industry, commercial services, or science, data privacy is a growing concern for researchers and practitioners alike. In addition, with more data being processed in the cloud, the impact of a potential data breach increases as well, especially when sensitive information such as engineering, financial, or medical data is concerned. The use of fully homomorphic encryption (FHE) can provide a solution to this issue: Since all data is encrypted before being sent to the cloud, all information remains secure even if a malicious party is able to gain access to the cloud computing environment.
        In this talk, we will take a look at homomorphic encryption for securely processing numerical data and assess its potential for privacy preserving applications in the context of scientific computing. After a brief introduction to the CKKS scheme for FHE, we will discuss the accuracy and performance implications of its basic operations for computations with floating point numbers. Finally, we will evaluate the potential of FHE for scientific computing by demonstrating the secure numerical simulation of partial differential equations with a finite difference approach.

        Speaker: Michael Schlottke-Lakemper (University of Augsburg)
      • 17:35
        Collaboration without data sharing: the Federated Secure Computing architecture 20m

        In domains with relevant security or privacy concerns, open data sharing among cooperation partners is often not an option. Here, cryptography offers alternative solutions to reconcile cooperation and data protection. Participants engage in peer-to-peer computation on encrypted data, arriving jointly at the intended result, without ever having access to each other’s input data. While elegant in theory, this approach has its own challenges in terms of complexity, DevSecOps and cloud federation.

        Federated Secure Computing is a free and open source initiative hosted by LMU Munich and financed by Stifterverband. The middleware between client-side business logic and server-side cryptography backend is designed to let research software engineering practitioners use secure computing with ease. It lets students write simple secure logic with as little as ten lines of Python code and can be run on IoT hardware such as Raspberry Zeros.

        In this talk, we present real-world use cases and learnings in terms of state-of-the-art administrative and technical data protection measures.

        Speaker: Hendrik Ballhausen
    • 19:00 22:00
      Dinner at The Q Quadro Hotel (ehemals ACHAT Plaza Karlsruhe): deRSE Social Event The Q Quadro Hotel

      The Q Quadro Hotel

      Mendelssohnplatz 2, 76131 Karlsruhe, Germany
    • 09:00 10:30
      Joint Keynote: Innovating at the Intersection: Software Engineering for Science and Industry 1h 30m Audimax A+B

      Audimax A+B

      Building 30.95

      Since the inception of the discipline at the NATO Software Engineering Conferences in the late 1960s, software engineering research and practice have primarily concentrated on business and embedded software, particularly in industrial sectors like finance and automotive. Research software that is designed and developed to facilitate research activities in various fields of science or engineering has been largely overlooked by software engineering research. However, there is an increasing acknowledgment of research software as an essential artifact and of research software engineers as a vital profession. On the one hand, research software propels scientific advancements, fosters open science principles, and plays a pivotal role in informing significant policy decisions, such as those related to climate action. On the other hand, it frequently serves as the foundation for software stacks in cutting-edge technologies like Quantum Computing, Artificial Intelligence, and Digital Twin applications. Thus, there is an increasing demand for software engineering methods specifically tailored to research software, with the potential to benefit software development in traditional business domains as well.

      Drawing from my experiences in academic research at universities, research software engineering at the German Aerospace Center (DLR), as well as software engineering in industry, in this talk I will explore the commonalities and differences between software engineering in industrial and scientific settings. I will also shed light on the landscape of research software engineering and clarify its significance to modern software engineering research.

      Speaker: Michael Felderer (DLR)
    • 10:30 11:00
      Coffee Break incl. group photo 30m Audimax Foyer

      Audimax Foyer

      Building 30.95

    • 11:00 12:30
      A short introduction to Nextflow: Bring your data science pipelines to the next level 1h 30m Seminarroom 006 (Building 30.96)

      Seminarroom 006

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      Analysing data typically consists of several steps with dedicated tools, chained one after another. In theory, all this can be achieved with well-written scripts. However, workflow managers help developers increase the reproducibility of their pipelines and results by providing features for workflow and data provenance, portability, readability, and fast prototyping.

      Nextflow is a reproducible, versatile and powerful workflow management system designed to simplify the development and execution of data-driven computational pipelines. It is predominantly used in bioinformatics, but its flexibility makes it suitable for other scientific domains and data science. It’s designed to be environment-agnostic, meaning that workflows can be executed across a variety of computing platforms, including high performance clusters and cloud services, without modification. It comes with close integration with software containers (e.g., Docker, Singularity) and Conda and Spack environments.

      Nextflow has an active community, including nf-core: a community-driven movement that aims to establish best practices, tools, and guidelines for developing, testing, and curating Nextflow pipelines. The nf-test test framework enables you to test all components of your data science pipeline, from end-to-end testing of the entire pipeline to specific tests of processes or custom functions.

      In this workshop, we will briefly introduce Nextflow and valuable nf-core resources: We will cover the fundamental components of a Nextflow script and work on a small hands-on example workflow. Our goal is to lay the foundation with Nextflow, empowering you to optimize your pipeline development and enhance the efficiency and scalability of your scientific research.


      Training material and prerequisites:
      - Bring your own laptop 😊
      - Basic knowledge how to work on a terminal
      - It is highly recommended to use Gitpod
      - If this is not option for you, please checkout the Environment Setup
      - Training material: Hello Nexflow


      Speakers: Marie Lataretu, Paul Wolk (Robert Koch-Institut)
    • 11:00 12:30
      BoFs: SE and Research Seminarroom 104 (Building 30.96)

      Seminarroom 104

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe
      • 11:00
        (Research) Software Engineering (Research): Creating knowledge and busting myths at the intersection of software engineering and research. 45m

        Software Engineering Researchers (SERs) and Research Software Engineers (RSEs) can potentially benefit from each other: SERs can provide RSEs with state-of-the-art research knowledge, methods and tools from software engineering that can help create better software for better research. RSEs can help SERs understand the specific challenges they face in research software engineering, and thus provide interesting new research questions. This can create a virtuous circle of mutual benefit through collaboration. However, known and unknown gaps in knowledge about what the respective other community does, and how the practices of software engineering in, e.g., industry and in research differ may have led to preconceptions and myths that obstruct the pathway to fruitful and reciprocal collaboration. In this session, we create an opportunity to bust some of these myths - and help build understanding, transparency and trust - by engaging members of both communities in discussion, in a fishbowl format. The session discusses myths, preconceptions and questions about the "other" community and their practice, challenges and outputs. Preconceptions and questions are fielded anonymously before and during the session, and participants can engage personally by joining the discussion fishbowl if and when they want. At the end of this session, participants have ideally increased knowledge about, and understanding of, another community that operates at the intersection of software and research, and potentially also their own community.

        Speakers: Anna-Lena Lamprecht (Universität Potsdam, Institut für Informatik und Computational Science), Florian Goth (Universität Würzburg), Stephan Druskat (German Aerospace Center (DLR))
      • 11:45
        Software Engineering for and with Researchers: What is required? 45m

        With the role of the Research Software Engineer in the academic landscape now better defined, it is time to ask the question to a broader audience of how can we adapt traditional Software Engineering practices to most effectively fit the needs of the research community. What differentiates researchers who write code from RSEs? How do their aims, drivers and motivations differ and how does this affect the application of technical skills? Can we, indeed do we need to, change the way traditional software skills are applied in order to more effectively support the RSE domain? Research is becoming increasingly reliant on digital skills and infrastructure but we won’t be able to capitalise on their capabilities if we don’t ensure that they are used correctly. The teachingRSE project together with EVERSE aims to have a discussion among a diverse set of participants on how to adapt SE practices for domain researchers and software professionals working within the research community.

        Speakers: Florian Goth (Universität Würzburg), Guido Juckeland (Helmholtz-Zentrum Dresden-Rossendorf), Jan Philipp Thiele (Weierstrass Institute Berlin), Jean-Noël Grad (University of Stuttgart), Dr Jeremy Cohen (Imperial College London)
    • 11:00 12:30
      Large Language Models(LLMs) in RSE Audimax A

      Audimax A

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Felipe Donoso Aguirre
      • 11:00
        MLentory: A Machine Learning model registry with natural language queries 20m

        The rapid increase of Machine Learning (ML) models and the research associated with them has created a need for efficient tools to discover, understand, and utilize these resources. Researchers often need help traversing the large collection of ML repositories and finding models that align with their specific requirements, such as open-source availability, FAIR principles, and performance metrics. MLentory addresses this challenge by providing a comprehensive solution for ML model discovery.

        MLentory is a system that extracts, harmonizes, and stores metadata from diverse ML model repositories, including Hugging Face and OpenML. This metadata is harmonized using RDA FAIR4ML schema, stored into FAIR Digital Objects (FDOs), and indexed to enable efficient natural language-based search. By leveraging different information retrieval techniques, MLentory enables researchers to discover, compare, and dive into ML models tailored to their needs.

        The core components of MLentory are an ETL pipeline, a backend service, and a frontend interface. The ETL pipeline, implemented using Python scripts, extracts metadata from various sources, transforms it into a standardized format, and loads it into a PostgreSQL database for historical tracking, a Virtuoso database for RDF-based knowledge representation, and an Elasticsearch module for efficient data indexing. Each stage of the pipeline operates independently within its own container.

        Then there is the backend module, built with FastAPI, which serves as the query engine, enabling users and other systems to retrieve information from the different data stores in MLentory. The natural language-based search leverages Elasticsearch for initial retrieval and then employs a self-hosted LLM powered by Ollama to refine search results through Retrieval Augmented Generation (RAG).

        Finally, the frontend module, developed using Vue3.js, provides a user-friendly interface that allows users to explore models using natural language and different search filters, and then delve into their version history. MLentory maintains a history of metadata changes for each model, allowing users to track their evolution and identify versions with the right compatibility for their needs.

        One of the main features of MLentory is its highly decoupled architecture, where each component runs in its own Docker container, and Apache Kafka is used as a common framework for asynchronous communication between containers. Apache Kafka is built on the idea of having queues where publishers can write messages and consumers can read them. This modular design facilitates independent scaling, flexible technology choices, and isolated error handling.

        The downside of a decoupled architecture is that maintenance and code standards become more difficult to enforce. Then, to ensure the quality and reliability of the system, a testing framework was implemented. It encompasses unit, integration, and coverage tests. Additionally, linting checks are employed to maintain code style and consistency. This framework is automated through a continuous integration (CI) pipeline deployed on CircleCI, guaranteeing that tests are executed after every code commit.

        Speaker: Nelson David Quinones Virgen (ZB MED)
      • 11:20
        LLMs for Enhanced Code Review 20m

        Collaborative software development for demands rigorous code review processes to ensure maintainability, reliability, and efficiency. This work explores the integration of Large Language Models (LLMs) into the code review process, with a focus on utilizing both commercial and open models. We present a comprehensive code review workflow that incorporates LLMs, integrating various enhancements such as multi-agent capabilities and reflection. By harnessing the capabilities of LLMs, the review process can uncover faults and identify improvements that traditional automated analysis tools may overlook. This integration shows promise for improving code quality, reducing errors, and fostering collaboration among software developers.

        Speaker: Alexey Rybalchenko (GSI Helmholtz Centre for Heavy Ion Research)
      • 11:40
        Helmholtz Blablador: An Inference Server for Scientific Large Language Models 20m

        Recent advances in large language models (LLMs) like chatGPT have demonstrated their potential for generating human-like text and reasoning about topics with natural language. However, applying these advanced LLMs requires significant compute resources and expertise that are out of reach for most academic researchers. To make scientific LLMs more accessible, we have developed Helmholtz Blablador, an open-source inference server optimized for serving predictions from customized scientific LLMs.

        Blablador provides the serving infrastructure to make models accessible via a simple API without managing servers, firewalls, authentication or infrastructure. Researchers can add their pretrained LLMs to the central hub. Other scientists can then query the collective model catalog via web or using the popular OpenAI api to add LLM functionality in other tools, like programming IDEs.

        This enables a collaborative ecosystem for scientific LLMs:

        • Researchers train models using datasets and GPUs from their own lab. No need to set up production servers. They can even provide their models with inference happening on cpus, with the use of tools like llama.cpp.
        • Models are contributed to the Blablador hub through a web UI or API call. Blablador handles loading models and publishing models for general use.
        • Added models become available for querying by other researchers.
          A model catalog displays available LLMs from different labs and research areas.
          Besides that, one can train, quantize, fine-tune and evaluate LLMs directly with Blablador.

        The inference server is available at http://helmholtz-blablador.fz-juelich.de

        Speaker: Alexandre Strube (Helmholtz AI)
    • 11:00 12:30
      Research Software working on Medical Data Room 206 (Building 30.70)

      Room 206

      Building 30.70

      Straße am Forum 6, 76131
      Convener: Frank Loeffler (Friedrich Schiller University Jena)
      • 11:00
        Developing user-centered research software for aeromedical image analysis 20m

        Cine cardiac magnetic resonance imaging (cineCMR) is a well-established imaging modality that is widely used in clinics and research centers to evaluate vital parameters like the exact stroke volume in the left and right ventricle or blood flow in the aortic root. By significantly shortening the recording times of individual images, it is now even possible to examine influences such as respiration, arrhythmia or severe heart disease using so-called real-time MRI. As this is currently only research hardware, there is little or no software for analyzing recorded data (semi-)automatically.
        As shorter recording times significantly increase the amount of images, manual analysis by medical professionals is increasingly inefficient. Although frameworks such as nnU-Net significantly simplify the use of machine learning for the automatic analysis of medical image data for scientists, customized software solutions are needed to improve human-machine interaction.
        RCInsight is a research software developed at the Institute of Software Technology and the Institute of Aerospace Medicine within the German Aerospace Center. It is intended to enable research into the effects of aerospace environments on the cardiovascular system by utilizing recorded real-time MRI data. Developed in the Scalable Machine Learning group of the High Performance Computing department, we are not only researching the productive use of state-of-the-art machine learning models, but also the collaboration between end user and model.
        We showcase how RCInsight is currently used for both research in aerospace and clinical applications and how we intend to cover the entire data pipeline from capturing to reporting in a transparent, reliable, interactive and high-performance manner. For that we present our web service, that allows users to monitor pipeline progress, investigate interim results and potentially interact with the pipeline configuration.

        Speaker: Jonas Levin Weber
      • 11:20
        JTrack: a digital biomarker platform for remote monitoring of daily-life behaviour in health and disease 20m

        The use of research software in digital health is becoming increasingly vital, particularly in the remote monitoring of neurological and psychiatric conditions. My work focuses on the development and implementation of the JTrack platform, an open-source solution designed for continuous data collection from smartphones, which serves as a scalable and privacy-compliant tool for digital biomarker acquisition. This software ecosystem includes JTrack Social for sensor data collection, JTrack EMA for ecological momentary assessment, and JDash for study management, allowing for comprehensive data handling in research studies. JTrack’s ability to securely collect health-related data, such as motion, social interactions, and geolocation, makes it a critical tool for digital phenotyping, particularly in the study of diseases like Parkinson’s and other neurological disorders.
        Our work has highlighted JTrack's potential in remote assessments, using longitudinal data collected via smartphones. For example, it was successfully integrated with DataLad, ensuring reproducibility, scalability, and data privacy in accordance with GDPR regulations. Applications of this software have already been demonstrated in the publications used by Jtrack. The use of research software like JTrack is a promising advancement in digital health, facilitating a more comprehensive understanding of patient health beyond the clinical environment.
        In this talk, I will discuss the technical architecture of JTrack, its applications in ongoing research projects, and its implications for future research. Specifically, I will explore how research software enhances reproducibility, scalability, and data security in digital health studies. Moreover, the talk will highlight the lessons learned from deploying these tools in real-world studies and address the challenges and opportunities that lie ahead in developing research software for health monitoring.
        By leveraging robust open-source platforms, researchers and clinicians can access real-time, actionable insights into patient health, paving the way for innovative digital therapeutics and more personalized healthcare solutions.

        Speaker: Dr Mehran Turna (Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich)
      • 11:40
        siibra: A Comprehensive Toolsuite for Reproducible Neuroscience Workflows Handling Big Image Data 20m

        An investigation of the intricacies of the human brain is contingent upon the ability to encompass the diverse array of its structural and functional organization within a common reference framework. Despite the substantial advancements in brain imaging and mapping, a significant challenge persists in using information from different scales and modalities in a coherent manner within the prevalent neuroscience workflows. In particular, with the massive increase in resolution and throughput in microscopic imaging, there is a clear need to access and use multi-resolution image data from cloud resources, which requires different handling than classical file-based data. This demands software solutions that unify access to image data regardless of format and size, and harness the wealth of information at hand, from visually guided exploration to computational workflows for analysis and simulation. We present siibra, a toolsuite that facilitates the seamless integration of data from a multitude of modalities and resources with anatomical structures, even at the terabyte scale. It provides users with convenient access to a comprehensive range of reference templates at varying spatial resolutions, complementary parcellation maps, and multimodal data features.

        The suite consists of a web-based 3D viewer (siibra-explorer), a Python library (siibra-python) designed to address a diverse range of use cases, and a REST API (siibra-api) to facilitate access to siibra-python features. We report how siibra was designed to implement a multilevel atlas of the human brain, linking macro-anatomical concepts and their inter-subject variability with measurements of the microstructural composition and intrinsic variance of brain regions, and allowing their study in a reproducible and robust manner. The framework employs EBRAINS as a data sharing platform and cloud infrastructure, and incorporates interfaces to a range of other neuroscience resources. Furthermore, siibra enables users with the ability to conveniently extend its configuration to incorporate additional neuroscience datasets, including potentially sensitive patient data, through the use of straightforward JSON schemas. This allows for the seamless sharing of these datasets with collaborators, along with the scripts that utilize them for a reproducible workflow.

        Speaker: Dr Ahmet Nihat Simsek (Forschungszentrum Juelich, INM-1)
      • 12:00
        Phenotyper - a software tool for collection and management of phenotypic data using controlled vocabulary 20m

        Plant breeding and genetics demand fast, exact and reproducible phenotyping. Efficient statistical evaluation of phenotyping data requires standardised data storage ensuring long-term data availability while maintaining intellectual property rights. This is state of the art at phenomics centres, which, however, are unavailable for most scientists. For them we developed a simple and cost-efficient system, the Phenotyper, which employs mobile devices (e.g. smartphones or tablets) or personal digital assistants (PDA) for on-site data entry and open-source software for data management. A graphical user interface (GUI) on a PDA replaces paper-based form sheet and data entry on a desktop. The user can define his phenotyping schemes in a web tool without in-depth knowledge of the system. In the Phenotyper, schemes are built from controlled vocabulary gained from published ontologies. Vocabulary and schemes are stored in a database that also manages the user access. From the web page, schemes are downloaded as extended markup language (XML) files for the transfer to the PDA and the exchange between users. On the PDA, the GUI displays the schemes and stores data in XML format (and also in comma separated value format). The user may define data types (integer, double, date, boolean, text, option lists) as well as validation patterns for text input and upper and lower value thresholds to enable a first validation step during data collection. Display of pictures (to support visual evaluation steps) as well as the storage of pictures taken during data collection is also possible. We decided to use the XML format for the data transfer since it guarantees a high level of unambiguousness and a certain persistence against corruption. The first version of Phenotyper has been developed for the operating system (OS) Windows CE about ten years ago. Since this OS is no longer supported - at least its mobile version - we are now working on a completely new version for Android, and, as a future plan, also for iOS. (Reference: DOI: 10.1186/s13007-015-0069-3)

        Speaker: Jurgen Gremmels (Max-Planck-Institute of Molecular Plant Physiology)
    • 11:00 12:30
      SE Architectures Audimax B

      Audimax B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Sven Peldszus ((Ruhr University Bochum))
      • 11:00
        Architecture-based Issue Propagation Analysis 22m
        Speakers: Sandro Speth (Institute of Software Engineering, University of Stuttgart), Niklas Krieger (Karlsruher Institute of Technology), Robert Heinrich (Institute of Software Engineering, University of Stuttgart), Steffen Becker (University of Stuttgart)
      • 11:22
        Exploring Architectural Design Decisions in Mailing Lists and their Traceability to Issue Trackers 22m
        Speaker: Mohamed Soliman (Paderborn University)
      • 11:45
        The vision of on-demand architectural knowledge systems as a decision-making companion 22m
        Speakers: Maryam Razavian (Eindhoven University of Technology), Barbara Paech (Heidelberg University), Antony Tang (Swinburne University of Technology)
      • 12:07
        Learning From Each Other: How Are Architectural Mistakes Communicated in Industry? 22m
        Speakers: Marion Wiese (Universität Hamburg, FB Informatik), Axel-Frederik Brandt (Universität Hamburg, FB Informatik), André van Hoorn (University of Hamburg)
    • 11:00 12:30
      SE Reproducibility Vortragsraum 3. OG ( Building 30.51 (Bibliothek))

      Vortragsraum 3. OG

      Building 30.51 (Bibliothek)

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Matthias Tichy ((Ulm University))
    • 11:00 12:30
      Workflows for data pipelines SR A+B

      SR A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Joerg Schaarschmidt (Karlsruher Institute of Technology)
      • 11:00
        On Embedding Code Extracted From Coq Formalisations into Data Analysis Workflows 20m

        Formal methods are essential for ensuring the correctness of algorithms
        and they are used in many different variants. Algorithms are verified using
        temporal logic, separation logic and type-theoretic approaches, for example.
        In particular, type-theoretic formalisms, as implemented by the Coq Proof Assistant
        offer the possibility of even synthesising software from its specification by proving
        the existence of a function that satisfies the specification.
        And while these methods are commonly used in security and safety-critical applications,
        such as operating systems, cryptography, and network communications, their use in
        data analysis workflows appears to be rather limited.
        However, in the analysis of scientific data, the correctness of the results is clearly
        desirable, if not essential.
        Furthermore, proof assistants such as Coq are much less constrained by the architecture,
        and can, for example, compute on integral values of arbitrary length, thus providing much
        higher precision than general-purpose programming languages.
        Finally, the Coq ecosystem provides many proven correct algorithms, some of which
        could be reused as part of research software, e.g. for data analysis tasks.

        In this talk we present the advances in the Coq software extraction mechanism, namely
        the ability to generate foreign function interfaces for communication between extracted
        OCaml code and plain C code.
        We focus on improvements in both the type safety of the data exchange and the
        maintainability of the generated interfaces. We discuss a simple example application
        for data analysis implemented in Coq. We then show how this application, including the
        foreign function interface, can be extracted and integrated into a sample data analysis workflow.

        Speaker: Mario Frank
      • 11:20
        OpenGHG - A community platform for greenhouse gas data analysis 20m

        To address the urgent need to understand changes in greenhouse gas
        (GHG) emissions, there has been dramatic growth in GHG measurement
        and modelling systems in recent years. However, this growth has led to
        substantial challenges; to date, there has been little standardisation of data products, and the interpretation of GHG data requires combined information from numerous models.
        OpenGHG is a platform that offers data retrieval from various public
        archives, data standardisation, and researcher-friendly data analysis tools. It helps researchers overcome the challenges posed by independent networks, archival standards, and varying spatial and temporal scales in greenhouse gas research. OpenGHG has an internal set of standards into which different data formats are converted. It offers data analysis and visualisation tools, a Jupyter Notebook interface, and will offer options for both cloud and local installations. Additionally, to handle large data we have employed the Zarr storage system for efficient file storage handling.
        In this presentation, a demonstration of OpenGHG is being used in the
        development of a prototype “operational” emissions evaluation system for
        the UK, the Greenhouse gas Emissions Measurement and Modelling Ad-
        vancement (GEMMA). This system will combine bottom-up (inventory-
        based) and top-down (observation-based) approaches to evaluate emissions
        in near-real time. An attempt will be made to shed light on some of the
        challenges faced and associated success stories that occurred during the development of this flexible and extensible community-led software to tackle scientific and technical challenges.
        Keywords: machine-learning, writing, conferences, assessment
        Key theme: academic writing

        Speaker: Prasad Sutar (University of Bristol)
      • 11:40
        NER4all or Context is All You Need: High-Performing Out-of-the-Box NER for Historical and Low-Resource Texts with LLMs through a Humanities-Informed Approach 20m

        Als mit der Veröffentlichung von ChatGPT die Aufmerksamkeit in einem verstärkten Maße auf Large Language Models gelenkt wurde, wusste man zwar, dass sich damit vieles verändern würde, doch die konkreten Auswirkungen waren noch nicht absehbar. Mit unserem Vortrag wollen wir für die Geschichtswissenschaften aufzeigen, was diese neue Technologie ganz konkret für unser Fach leisten kann. Am Beispiel der Named Entity Recognition (NER), also der Erkennung und Klassifikation von Eigennamen bzw. von jenen Textstellen, die namentlich auf Entitäten wie Personen, Orte oder Institutionen verweisen, wollen wir zeigen, inwieweit mit dieser Technologie auch in den Geschichtswissenschaften neue Wege beschritten werden können.

        NER ist dabei keineswegs so klar und eindeutig, wie es manchmal den Anschein haben mag – was in besonderer Weise für die Anwendung in den Geschichtswissenschaften gilt. Denn anders als z.B. in der Medizin oder der Biologie, wo diese Techniken in der Regel auf die immer gleichen Textgattungen (zumeist wissenschaftliche Publikationen) angewendet werden, ist die Bandbreite der möglichen Formen und Formate der Quellen in den Geschichtswissenschaften ungleich größer. Sowohl unterschiedliche Sprachen, wie unterschiedliche Textgattungen (von Zeitungsartikeln über Verwaltungsdokumente bis hin zu Briefen oder Tagebucheinträgen), als auch wechselnde Themenbereiche, Sprachebenen wie auch sich historisch wandelnde kulturelle Praktiken und das frühere Fehlen von Rechtschreibregeln bis hin zu unterschiedlichen Ebenen der editorischen Bearbeitung von Quellen bei deren Erschließung erschweren die Aufgabe der Named Entity Recognition erheblich. Zumal hier zuletzt vor allem Methoden ausschlaggebend waren, die auf maschinelles Lernen und damit vor allem auf umfangreich annotierte Trainingsdaten basieren. Für die Geschichtswissenschaften aber ist dies schwierig, da der Aufwand, entsprechende Korpora zu erstellen, äußerst ressourcenintensiv ist und am Ende der Aufwand, eigene Modelle zu trainieren, deren Nutzen übersteigt.

        Im Rahmen des Vortrags werden wir zeigen, dass – im Gegensatz zu den bisherigen Behauptungen in der Forschung – LLMs zumindest im Bereich der NER in den Geschichtswissenschaften ein Game Changer sein können. Wir werden zeigen, dass es unter Berücksichtigung der besonderen Eigenschaften der LLM nun möglich ist, ohne den Einsatz spezifischer Modelle oder die aufwändige Aufbereitung von Trainingsdaten und das Training spezifischer Modelle, allein durch geschicktes Prompting, deutlich bessere Ergebnisse zu erzielen als mit den aktuellen Modellen beispielsweise von spaCy und flair. Dabei ist unsere Methode auf jede Textgattung, Domäne und Sprachebene anwendbar.

        Speakers: Nicole Elisabeth Hitomi Dresselhaus (Humboldt-Universität zu Berlin), Till Grallert (Humboldt-Universität zu Berlin)
      • 12:00
        Assisting Data Analysis using Program Slicing with flowR 20m

        Consider you are a reviewer checking the correctness of a research artifact or a data scientist searching for a data cleaning step or visualization to reuse.
        Either way you are confronted with hundreds of lines of code, usually involving various datasets and several different plots, making it difficult to understand the code's purpose and the data flow within the program.
        Addressing this issue, program slicing reduces scripts to only what is relevant for a plot or data transformation. Furthermore, it assists authors when writing code, indicating parts which are not relevant for the desired output and helping them to improve the analysis structure.

        With our talk, we introduce flowR, a novel program slicer and dataflow analyzer for the R programming language.
        Given a variable of interest, like a plot, flowR returns the resulting slice either directly or interactively within an IDE.
        We provide a flowR addin for RStudio and a more feature-rich extension for Visual Studio Code which will be the focus of the talk, offering features like multi-cursor support, highlighting, and more.
        Additionally, we provide a server session, a read-eval-print loop, and a GitHub Codespace to try flowR without any installation.
        flowR is developed as an open-source project (under the GPL-3.0 license) on GitHub and offers a docker image.

        We focus on R, because the set of existing tools to support the large and active community is relatively small, without any preexisting program slicer.
        Although the RStudio IDE and the R language server, as well as the {lintr} and the {CodeDepends} package perform static analysis on R code, all of these tools
        rely on simple heuristics like XPath expressions on the abstract syntax tree (AST), causing their results to be imprecise, limited, and sometimes wrong.

        flowR first normalizes the AST, using it as the basis for a stateful fold, incrementally constructing the dataflow information of each subtree. For the analysis, we use a dynamic dispatch on an abstract interpretation of the active R environment to handle the language's dynamic nature. With the dataflow graph, the program slicing reduces to a reachability problem solved by a modified breadth first search. Finally, the slice is either reconstructed as R code or highlighted directly in the input.

        Currently, we offer limited support for R's wide variety of side effects (e.g., modifying functions at runtime), defaulting to over-approximation and conservative results.
        However, we use automatic input-output validation on a wide set of sources to make sure that the generated slices and the dataflow graph are correct.

        Using real-world R code (written by social scientists and package authors) shows that flowR can calculate the dataflow graph (and the respective slice for a given variable reference, including the parsing and normalization) in an average of 200-500ms. Slicing for manually selected points of interest (e.g. plots), we reduce the program on average to just 12.7%[±11%] of its original size.

        In our talk, we focus especially on flowR's extensions and how they benefit R programmers in their everyday work.

        Speaker: Florian Sihler (Ulm University)
    • 12:30 14:00
      Lunch Break 1h 30m Audimax Foyer

      Audimax Foyer

      Building 30.95

    • 14:00 15:30
      Aspects of Usability in RSE 1h 30m Seminarroom 006 (Building 30.96)

      Seminarroom 006

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      Research software developers are usually very familiar with the functional requirements of the software they are developing, as it is often closely linked to their research discipline. Quality requirements are often less well understood due to a lack of background in computer science [1]. Usability is one of these quality requirements that may be critical for the research software to support its underlying research goal.
      The aim of this workshop is therefore to study and foster exchange on aspects of usability in RSE. These aspects may vary, e.g. from learnability over operability to inclusivity (see ISO/IEC 25010 for further aspects).

      To steer the discussion, we want to encourage the participants to bring their own projects, so we can work together to specifically answer the following questions::
      Who are the users of the RSE project based on the research goal?
      Which aspects of usability play an important role in the RSE project for these users?
      What does each of these usability aspects require to be fulfilled in the RSE project?
      How do these usability aspects influence the research goal that the RSE project contributes to?
      Is the usability aspect currently fulfilled within the RSE project and why (or why not)?

      We conclude the discussion by summarizing which usability aspects are important in the selected RSE projects, and how research software engineers could be supported with practical software engineering methods to take these usability aspects into account.

      Also participants interested in aspects of usability that do not bring their own RSE project are welcome to join the discussion.

      Wiese, I.; Polato, I.; Pinto, G.: Naming the Pain in Developing Scientific Software. IEEE Software 4/37, pp. 75–82, 2020.

      Speakers: Jan Bernoth (Universität Potsdam), Prof. Leen Lambers (BTU Cottbus-Senftenberg)
    • 14:00 15:30
      BoF: Bringing together software engineering researchers and research software engineers (SE4Science @ deRSE25) Seminarroom 104 (Building 30.96)

      Seminarroom 104

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe
      • 14:00
        BoF: Bringing together software engineering researchers and research software engineers (SE4Science @ deRSE25) 1h 30m

        Research software is attracting increasing attention from both society and funding agencies, such as the German Research Foundation (DFG). There are lots of exciting opportunities for research into how software engineering practices can be best applied to help the people who develop research software. However, many potential research projects in this area falter because of either misconceptions about software engineering research or misconceptions about research software engineering.

        This Birds of a Feather session takes advantage of the co-location of the 2025 German Software Engineering Conference (SE25) and 2025 German Research Software Engineering Conference (deRSE25) to bring together people from both communities who are interested in understanding how software engineering practices can support research software, either from a basic research perspective (creating new knowledge or insight on the development of research software) or applied research perspective (finding solutions to practical problems encountered while building research software).

        The expected audience is software engineering researchers (SERs) who are interested in extending their research to research software, and research software engineers (RSEs) who are interested in improvements to the way they build software and advancing the knowledge base around the development of research software. It will also be of interest to those engaged in related meta-science research topics looking at the impact of research software in different areas.

        The goals of this session are:

        1. Increase awareness of what each of the communities (software
          engineering researchers and research software engineers) do
        2. Identify topics of interest to both communities
        3. Connect software engineering researchers to research software engineers to help build possible collaborations

        The format of the session would be:

        • Introduction and explanation of the intersection
        • Lightning talks: 2-3 examples of research topics at the intersection and current research on that topic to give context.
        • SER/RSE "speed dating"
        • Collection and summarisation of potential topics

        Potential lightning talk topics / speakers include:

        • Willi Hassebring: applying software categorisation to research
          software
        • Anna Lena Lamprecht: RSE research to inform policymakers
        • Mining Software Repositories to identify differences between research software engineering approaches
        • Software development models for research software

        A blog post summarising the workshop (including the topics identified) will be published after the conference and cross-posted to relevant websites (including de-RSE, BSSw.io, SocRSE, Software Sustainability Institute and US-RSE).

        This BoF follows-on from the 2024 Dagstuhl workshop on “Research Software Engineering: Bridging Knowledge Gaps” (Druskat et al., 2024).

        This BoF is part of the Software Engineering for Science series of events.

        Organisers

        • Prof Jeffrey C. Carver, University of Alabama
        • Prof Neil Chue Hong, University of Edinburgh
        • Prof Dr Michael Felderer, German Aerospace
          Centre (DLR) / University of Cologne

        References

        Stephan Druskat, Lars Grunske, Caroline Jay, and Daniel S. Katz. Research Software Engineering: Bridging Knowledge Gaps (Dagstuhl Seminar 24161). In Dagstuhl Reports, Volume 14, Issue 4, pp. 42-53, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
        https://doi.org/10.4230/DagRep.14.4.42

        Speakers: Prof. Jeffrey Carver (University of Alabama), Michael Felderer (Deutsches Zentrum für Luft- und Raumfahrt), Neil Chue Hong (University of Edinburgh)
    • 14:00 15:30
      Communities around Research Software Room 206 (Building 30.70)

      Room 206

      Building 30.70

      Straße am Forum 6, 76131
      Convener: Inga Ulusoy (University of Heidelberg)
      • 14:00
        ICON Community Interface (ComIn) - An infrastructure overview 20m

        Numerical modeling has a long history in climate and weather forecasting, with advancements being made continually over the last century due to technological progress. In the early 2000s, the development of ICON as an icosahedral grid-based, nonhydrostatic model started. It is Germany's primary model for weather predictions and climate studies (https://www.icon-model.org/). ICON is a flexible, high-performance modelling framework that enhances our understanding of Earth's climate system, providing critical data for societal use. In technical terms, the ICON model constitutes a sophisticated software package designed to function on massively parallel hardware. Currently, ICON development impacts approximately 1.5 billion people globally with its numerous applications, and more than 200 developers actively contribute to the project.

        A major milestone in January 2024 was the transition of ICON into an open source software. This allows scientists to contribute their own codes as separate modules, greatly increasing ICON's versatility. However, certain limitations remain: The fact that ICON was originally programmed in Fortran presents some challenges, and integrating changes into ICON can be a difficult and lengthy process. This is most pronounced due to its use for operational numerical weather prediction, which imposes severe restrictions on runtime. These limitations fuelled the need for a standardized long term solution and recently, a team of ICON developers introduced an innovative Community Interface for ICON (ComIn). The objective of ComIn is to simplify the interaction with external software, so-called plugins, to further enhance ICON predictions and to significantly reduce maintenance efforts. Plugins can range from individual routines, e.g. diagnostics or physical parameterizations, to Earth System model components, e.g. atmospheric chemistry and land models.

        This ICON-ComIn infrastructure has multifold benefits:
        1. During ICON runtime, plugin's functions can be called from fixed locations within ICON via ComIn.
        2. ComIn also provides access to ICON data and metadata, as well as the option to create addtional ICON data.
        3. By providing APIs for Fortran, C/C++ and Python, ComIn bridges legacy code in Fortran with more "modern" Python-based developments.

        In this talk we will present the motivation behind the standardised interface for ICON and provide an in-depth examination of the details of ComIn's infrastructure. Additionally, we will introduce diverse ComIn use cases and present first results from a complex plugin, which drove innovation and development in ComIn. In summary, we will introduce the audience to a modern and innovative software infrastructure from the climate and weather domain.

        Speakers: Aparna Devulapalli (DKRZ), Kerstin Hartung
      • 14:20
        Standardizing the preCICE ecosystem 20m

        A rapidly emerging community is developing and publishing several software components and application cases on top of the coupling library preCICE for partitioned simulations. While several community-building measures have led to more users and contributors, the resulting contributions are often not readily findable, accessible, interoperable, and reusable. The DFG project preECO aims to form quality guidelines together with the community, standardizing how these components and application cases work together. A published component should, among other criteria, provide metadata in a machine-readable format, a configuration file adhering to a standard schema, document a minimum set of details, and be ready to interoperate with other standard components and application cases. This brings several benefits for each contributor and the community, providing development guidelines, more streamlined and faster reviews, wider discoverability and impact, and a pool of higher-quality components and reproducible application cases to reuse. This talk will present the planned measures and the results of a first structured discussion with the community at the preCICE Workshop 2024, which in parts led to conflicting views regarding the quality criteria and levels.

        See the working draft of these guidelines for adapters (software components) and application cases, developed and discussed (in the preCICE forum) in public, together with the community.

        Speaker: Mr Gerasimos Chourdakis (University of Stuttgart)
      • 14:40
        Maintaining Open Source BIMserver through Enhanced Contributor Support 20m

        Building Information Modelling (BIM) is extensively used in the AEC (Architecture, Engineering,
        and Construction) industry to optimize processes throughout the design, construction, and
        operation of buildings and to promote collaboration among stakeholders. Despite long-standing
        efforts to facilitate interoperability through open standards, most notably the Industry Foundation
        Classes (IFC), it remains a challenge to ensure all stakeholders have access to accurate,
        up-to-date and consistent information. The Opensource BIMserver, an open source server
        software that enables users to store and manage building information in IFC format, can play a
        vital role in facilitating information exchange in construction projects. With BIMserver, IFC data is
        maintained in a database, allowing for efficient storage, versioning, merging, and the ability to
        query, merge, and filter the building data and generate different outputs.
        However, maintaining software requires significant effort, as it not only involves creation of new
        features along a roadmap, but first of all the regular update of the codebase, continuous
        improvement of existing features, documentation, and user support. This is even more drastic
        for open source software with limited resources. In this paper, we address the challenges of
        maintaining BIMserver and describe the structured approach and strategies we employed to
        support a new contributor. We exemplify our approach with actions taken in four pivotal areas
        identified, namely documentation, issue tracking, dependency updates, and automated testing.

        Keywords: BIM; open-source software; BIMserver

        Speaker: Zaqi Fathis (HTW Dresden)
      • 15:00
        Towards Guidelines for Engineering of Energy Research Software 20m

        Energy research software (ERS) plays a vital role because it enables and supports numerous tasks in energy research. The complexity of ERS ranges from simple scripts and libraries, e.g., for Python, to full software solutions. [1]
        ERS is often developed by energy researchers with diverse backgrounds (e.g., physics, mechanical engineering, electrical engineering, computer science, social science) who are mostly not trained in software engineering. This leads to various approaches to creating ERS often with limited focus on testing and maintenance.
        To enhance the re-usability of research software, in line with the FAIR principles for research software [2], code quality and overall software management are essential. Since research software differs from industrial applications, not all software engineering methods apply to research software engineering. Collections of best practices of research software engineering can guide researchers to improve their software engineering and, consequently, the overall research process. Such collections already exist for other domains, e.g., [3], [4]. Since research software and especially its development process differs between domains, we argue that an energy-specific preparation of best practices of software engineering can support energy researchers. These best practices should cover areas like the conceptualization, development, maintenance, and publication of ERS, serving as structured processes within the ERS life-cycle.
        To collect best practices for ERS, we invited energy researchers with experience in engineering ERS to two on-site workshops in August and November 2024. The first workshop was held internally with 8 researchers from different groups of our research institution. The second workshop was open and attracted around 20 researchers from 12 different institutions across Germany. During these workshops, we collected relevant aspects and practical solutions for software engineering in energy research. Based on the workshop results, we will formulate around 10 ERS-specific guidelines for energy researchers to improve their software engineering. This collection will serve as a common starting guide for early-career researchers when they start to develop ERS. Additionally, this collection will improve the awareness of research software engineering in the energy domain.
        In our talk, we will present the results of the two workshops and the first version of the guidelines for energy research software engineering.

        References
        [1] S. Ferenz and A. Nieße, “Towards Improved Findability of Energy Research Software by Introducing a Metadata-based Registry,” ing.grid, vol. 1, no. 2, Art. no. 2, Nov. 2023, doi: 10.48694/inggrid.3837.
        [2] M. Barker et al., “Introducing the FAIR Principles for research software,” Sci. Data, vol. 9, no. 1, p. 622, Oct. 2022, doi: 10.1038/s41597-022-01710-x.
        [3] R. C. Jiménez et al., “Four simple recommendations to encourage best practices in research software,” F1000Research, vol. 6, p. 876, Jun. 2017, doi: 10.12688/f1000research.11407.1.
        [4] M. List, P. Ebert, and F. Albrecht, “Ten Simple Rules for Developing Usable Software in Computational Biology,” PLOS Comput. Biol., vol. 13, no. 1, p. e1005265, Jan. 2017, doi: 10.1371/journal.pcbi.1005265.

        Speaker: Stephan Alexander Ferenz (Carl von Ossietzky Universität Oldenburg; OFFIS)
    • 14:00 15:30
      ML-assisted and more general data workflows Audimax A

      Audimax A

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Joerg Schaarschmidt (Karlsruher Institute of Technology)
      • 14:20
        The OpenQDA Project - A Software for Collaborative Qualitative Analysis and Sustainable Open Research 20m

        Qualitative research involves a large number of different analysis methods, which are increasingly supported by the use of qualitative data analysis software. In addition to larger closed commercial software products, there are a number of open source projects that implement individual analysis methods. A major problem for the sustainability of these fragmented software projects is the lack of a large software community to support them.

        To address this problem, the OpenQDA project [QDA24] provides an extensible research platform for open science. It extends the qualitative data analysis standard REFI [Ev20] and implements a pluggable architecture for the following aspects of qualitative analysis: (1) data import and discovery for the use of different file formats, databases, archives and APIs; (2) media display and coding user interfaces for qualitative annotation of different data formats such as text, audio, video, geospatial or motion data; (3) encoding methods and workflows for systematic support of coding schemes; and (4) analysis and visualization methods. In the presentation we will give an architectural overview and discuss examples for the implementation of specific plug-ins, such as AI models for automatic transcription of audio files [Ha24] or the integration of visualization platforms.

        The aim of the presented software is to provide a research platform for collaborative qualitative analysis, as well as for further method development for scientists with different levels of technical expertise, which in comparison has already been established in quantitative data analysis by projects, such as the R project.

        [Ev20] Evers, J.; Caprioli, M. U.; Nöst, S.; Wiedemann, G.: What is the REFI-QDA Standard: Experimenting With the Transfer of Analyzed Research Projects Between QDA Software. en, Forum Qualitative Sozialforschung / Forum: Qualitative Social Research 21 (2), Number: 2, 2020, issn: 1438-5627, doi: 10.17169/fqs- 21.2.3439, url: https://www.qualitative-research.net/index.php/fqs/article/view/3439, Last access: 04/2024.

        [Ha24] Haberl, A.; Fleiß, J.; Kowald, D.; Thalmann, S.: Take the aTrain. Introducing an interface for the Accessible Transcription of Interviews. en, Journal of Behavioral and Experimental Finance 41, S. 100891, 2024, issn: 22146350, doi: 10.1016/j.jbef.2024.100891,url: https://linkinghub.elsevier.com/retrieve/pii/S2214635024000066, Last access: 04/2024.

        [QDA24] Belli, A., Küster, J., Hohmann, F., Sinner, P., Krüger, G., Wolf, K. D., & Hepp, A. (2024). OpenQDA (1.0.0-beta.0). Zenodo. https://doi.org/10.5281/zenodo.11656546 Code: https://github.com/openqda/openqda

        Speakers: Prof. Karsten D. Wolf (University of Bremen), Jan Kuester (University of Bremen)
      • 14:40
        Generative AI for Research Data Processing: Lessons Learnt From Three Use Cases 20m

        Generative AI has generated enormous interest since ChatGPT was launched in 2022. However, adoption of this new technology in research has been limited due to concerns about the accuracy and consistency of the outputs produced by generative AI.

        In an exploratory study on the application of this new technology in research data processing, we identified tasks for which rule-based or traditional machine learning approaches were difficult to apply, and then performed these tasks using generative AI. We demonstrate the feasibility of using the generative AI model Claude 3 Opus in three research engineering projects involving complex data processing tasks:

        1) Information extraction: Extraction of plant species names from historical seedlists (catalogues of seeds) published by botanical gardens.
        2) Natural language understanding: Extraction of certain data points (name of drug, name of health indication, relative effectiveness, cost-effectiveness, etc.) from documents published by different Health Technology Assessment organisations in the EU.
        3) Text classification: Assignment of industry codes to projects on the crowdfunding website Kickstarter.

        We present the lessons learnt from this study:
        1. How to assess if generative AI is a suitable tool for a particular use case, and
        2. Strategies for enhancing the accuracy and consistency of the outputs produced by generative AI.

        Speaker: Dr Modhurita Mitra (Utrecht University)
      • 15:00
        Exploring Autonomous Agent Architectures in the context of Data Analysis Workflows 20m

        In artificial intelligence (AI), the Autonomous Agent Architecture (AAA) is a fundamental framework through which agents interact with their environment. The AAA leverages the Observe-Think-Act loop that allows either software or physical agents to reason within dynamic domains. This loop consists of three stages: observing data, reasoning about it, and finally acting based on the decisions made.

        This talk will introduce the concepts of the AAA and explore the possibilities and benifits that combining the AAA with Data Analysis Workflows might offer. Using the ability to observe and adjust workflows, robustness and reliability of workflows could be improved by adhering to domain constraint and semanticaly verify intermediate steps. These intermediate results of workflows could be validated, comparing outputs against predefined rules or expected patterns. When inconsistencies arise, the system could flag them for human review, suggest alternative hypotheses, or adjust subsequent steps dynamically. Furthermore, if implemented using symbolic reasoning, users will be able to trace decisions back to encoded knowledge, thereby enhancing transparency and trust on the results correctness.

        By observing and applying symbolic reasoning techniques, it is possible to ensure that workflows comply with domain constraints and remain effective even with an incomplete knowledge base. The AAA can work in uncertain conditions, but to function, it has to operate using knowledge assumed to be true or false. As soon as conflicting observations occur its knowledge base gets updated. This mechanism could allow workflows to function despite having noisy or partial datasets. Research has shown how symbolic AI approaches can solve configuration and scheduling tasks efficiently, while insights from solving Multi-Agent Pathfinding (MAPF) problems demonstrate how symbolic reasoning can coordinate multiple agents to minimize conflicts and optimize shared objectives. These techniques could allow workflows to be both reliable, efficient and adaptable to changing conditions.

        Speaker: Nikolas Bertrand
    • 14:00 15:30
      Research Software Engineering in HPC SR A+B

      SR A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Michele Martone (Leibniz Supercomputing Centre)
      • 14:00
        Fast GPU-powered and auto-differentiable forward modeling for cosmological hydrodynamical simulations 20m

        In the field of extragalactic astronomy we have typical two groups: The observers and the theorists. The nature of the data these two groups work with is very different: Observers count photons with the instrument detectors and theorists work with particles that have specific physical properties. This results in a rather small scientific exchange between both groups.
        Generally, there are two ways to bring observational data and simulation data closer together to allow a direct comparison between them: Forward modeling and inverse modeling.
        Forward modeling calculates what should be observed for a particular model. The forward model takes certain parameters and produces data that are comparable to actual observations. Here, we present RUBIX, a novel python framework that aims at bridging the gap between observation and modeling. RUBIX leverages modern GPU computing via the JAX ecosystem to implement forward modeling of the observation process in a telescope. It is a fully-tested, well documented, and modular Open Source tool developed in JAX, designed to forward model IFU cubes of galaxies from cosmological hydrodynamical simulations. We aim to establish RUBIX as a widely used community tool in astronomy. Because of modular and flexible software design, RUBIX also has potential applications outside of astronomy and thus can become a valuable research tool for a wide community. The software is designed in a linear pipeline structure that takes pure jax functions. The code automatically parallelizes computations across multiple GPUs, reducing computation time from many hours in common for state-of-the-art CPU computing frameworks to seconds.
        Inverse modeling is the process of starting with the result - the observational data - and calculating the causal factors that produce these data, which means constructing a model that accounts for the given set of observations. In the future, RUBIX will also allow for inverse modeling of observational data of galaxies. To this end, it leverages the sophisticated features of the JAX ecosystem and implements a pipeline structure that naturally supports differentiability of the computations.
        In this way, RUBIX aims at supporting the entire modeling spectrum that contemporary extragalactic astronomy demands. Our proposed contribution consists of a talk with a twofold focus: First, we will present the RUBIX project, its inherent challenges and the techniques we employ to overcome them on a technical level. Second and of equal importance, we will discuss the experiences and lessons learned in the journey from being a PhD student with no prior experience in research software engineering to developing an innovative open-source scientific software project that uses cutting edge technologies.

        Speaker: Anna Lena Schaible (Heidelberg University)
      • 14:20
        A Tale of Two Clusters: HPC Infrastructure for the Humanities and Applied Sciences 20m

        This presentation explores experimental platform and software engineering approaches for providing high-performance computing infrastructure to interdisciplinary research projects in the humanities and the applied sciences. Two research projects are presented that demonstrate very different use cases, both in terms of scale and functional requirements.

        "#Vortanz" was a project running from 2021 to 2024 that aimed to introduce machine learning into university-level dance education by establishing a processing pipeline integrated with a digital annotation tool embedded in the dance education curriculum at DSHS Cologne, HZT Berlin, and HfMdK Frankfurt.

        "KITeGG", started in 2021 and ongoing until late 2025, is a joint project of five German universities exploring the integration of AI in design teaching. It provides open and interactive access to GPU resources to the project partners while developing dedicated learning software integrated with the cluster architecture.

        While the former project's cluster hardware is sourced from up-cycled old university equipment, consumer hardware and modified mining-rig cases, the latter uses five NVIDIA HGX nodes with eight GPUs each and provides additional storage and CPU resources. Both use very similar software setups running Kubernetes. This allows for an interesting case study comparing two different approaches to bare-metal hardware with cloud deployment and critically examining the decisions made when providing HPC resources in rather atypical project setups.

        Using this juxtaposition, the presentation aims to highlight the political and ethical dimensions of these decisions that warrant critical examination. It is a rejection of "cloud-sourcing" in the context of publicly funded projects at higher education institutions. Furthermore, a reflection on the engineering culture within the institution as well as broader tropes and paradigms that influence and inform decisions in university-based RSE and speculation on alternative ways of approaching and exploring technological trends and mythologies.

        Speaker: Anton Koch (Hochschule Mainz)
      • 14:40
        Incremental MPI Parallelization of a Julia Functional Renormalization Group code: a case study 20m

        HPC-oriented Research Software Engineers are often required to perform optimization and parallelization on unfamiliar codebases.
        This activity is of utter importance, as it allows scientific research to make use of increasingly powerful (and increasingly complex) supercomputing infrastructure.
        In this talk I will share the experiences and lessons learned in the process of parallelizing incrementally with MPI a Functional Renormalization Group code written in Julia, including performance optimization, characterization testing, dealing with a CPU firmware update and refactoring to improve programmer productivity.
        Finally I will report on my experience of learning Julia while also approaching a somewhat unfamiliar domain.

        Speaker: Michele Mesiti
      • 15:00
        The Dos and Don'ts When Building a Video Streamer for Research 20m

        High resolution video recordings at high frame rates are necessary for a variety of research projects. This can pose a challenge for systems in terms of hard- and software, particularly if multiple streams need to be recorded simultaneously. The aim of this project was to design a setup that would allow for the recording of multiple streams at high frame rates and various image resolutions, while still satisfying the given resource constraints. The specifics of the project made it necessary to build a custom processing pipeline that would bypass limitations from the camera vendor's default software suite. Our custom setup allows for the simultaneous recording and saving of the resulting videos directly to a file server in our on-site data center. We will discuss the connection of multiple cameras through a switched high bandwidth network infrastructure for recording on a single compute node of a high performance computing (HPC) cluster. The details of the development and installation, including challenges faced, will also be presented. This includes the pros and cons of using the IBM POWER architecture, the setup of a specific Conda environment on an IBM POWER9 processor, and the building process for required packages including FFMPEG and OpencCV with GPU support. The GPU support is an important aspect in the setup, as it can reduce some of the high load on the CPU caused by the simultaneous recording of streams. We will present results obtained with a multi camera setup, with recordings at a frame rate of 100 Hz.

        Speaker: Stephanie Reilly (Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society)
    • 14:00 15:30
      SE Quality Assurance Vortragsraum 3. OG (Building 30.51 (Bibliothek))

      Vortragsraum 3. OG

      Building 30.51 (Bibliothek)

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Mario Frank
      • 14:00
        Unimocg: Modular Call-Graph Algorithms for Consistent Handling of Language Features 22m
        Speakers: Dominik Helm (Universität Duisburg-Essen, Technische Universität Darmstadt, National Research Center for Applied Cybersecurity ATHENE), Tobias Roth (Technische Universität Darmstadt, National Research Center for Applied Cybersecurity ATHENE), Sven Keidel (Technische Universität Darmstadt), Michael Reif (CQSE GmbH), Mira Mezini (Technische Universität Darmstadt, hessian.AI, National Research Center for Applied Cybersecurity ATHENE)
      • 14:22
        Actionable Light-weight Process Guidance: approach, prototype, and industrial user study 22m
        Speakers: Christoph Mayr-Dorn (Johannes Kepler University, Linz), Cosmina-Cristina Ratju (Johannes Kepler University Linz), Luciano Marchezan de Paula (Johannes Kepler University Linz), Felix Keplinger (Johannes Kepler University, Linz), Alexander Egyed (Johannes Kepler University), Gala Walden (formerly ACME-Automotive)
      • 14:45
        Role-based Modeling of Business Processes with RBPMN 22m
        Speakers: Tarek Skouti (Technische Universität Dresden), Ronny Sieger (University of St.Gallen), Frank J. Furrer (Technische Universität Dresden), Susanne Strahringer (Technische Universität Dresden)
      • 15:07
        Applying Concept-Based Models for Enhanced Safety Argumentation 22m
        Speakers: João Paulo Costa de Araujo, Balahari Vignesh Balu Vignesh Balu, Eik Reichmann, Jessica Kelly, Stefan Kuegele, Núria Mata, Lars Grunske
    • 14:00 15:30
      SE Testing Audimax B

      Audimax B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe, Germany
      Convener: Timo Kehrer ((University of Bern))
      • 14:00
        Advanced Mutation Testing of Java Bytecode Using Model Transformation 22m Audimax-2 (Building 30.95 )

        Audimax-2

        Building 30.95

        Straße am Forum 1, 76131 Karlsruhe
        Speakers: Christoph Bockisch (Philipps-Universität Marburg), Freya Dorn (Philipps-Universität Marburg), Deniz Eren (Philipps-Universität Marburg), Sascha Lehmann (Philipps-Universität Marburg), Daniel Neufeld (Philipps-Universität Marburg), Gabriele Taentzer (Philipps-Universität Marburg)
      • 14:22
        Efficient Detection of Test Interference in C Projects 22m Audimax-2 (Building 30.95 )

        Audimax-2

        Building 30.95

        Straße am Forum 1, 76131 Karlsruhe
        Speakers: Florian Eder (LMU Munich), Stefan Winter (Ulm University and LMU Munich)
      • 14:45
        Single and Multi-objective Test Cases Prioritization for Self-driving Cars in Virtual Environments 22m Audimax-2 (Building 30.95 )

        Audimax-2

        Building 30.95

        Straße am Forum 1, 76131 Karlsruhe
        Speakers: Christian Birchler (Zurich University of Applied Sciences & University of Bern), Sajad Khatiri (Zurich University of Applied Sciences & Università della Svizzera italiana), Pouria Derakhshanfar (JetBrains), Sebastiano Panichella (University of Bern), Annibale Panichella (Delft University of Technology)
      • 15:07
        Organizing Graphical User Interface tests from behavior‐driven development as videos to obtain stakeholders' feedback 22m Audimax-2 (Building 30.95 )

        Audimax-2

        Building 30.95

        Straße am Forum 1, 76131 Karlsruhe
        Speakers: Jianwei Shi (Leibniz Universität Hannover), Jonas Mönnich (imbus Niedersachsen GmbH), Jil Klünder (Leibniz Universität Hannover), Kurt Schneider (Leibniz Universität Hannover)
    • 15:30 16:00
      Coffee Break 30m Audimax Foyer

      Audimax Foyer

      Building 30.95

    • 16:00 17:30
      BoFs: Challenges for RSEs Seminarroom 104 (Building 30.96)

      Seminarroom 104

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe
      • 16:00
        Communication Challenges in RSEs’ daily work 45m

        In the natESM sprint process, RSEs work closely with scientists to tackle technical challenges within a collaborative research environment, presenting unique interpersonal and communication challenges. In this meetup we want to highlight and discuss some hurdles we as natESM RSEs encounter, including bridging gaps in technical knowledge, adapting to diverse communication styles, and balancing technical priorities with scientific goals. Key challenges also involve managing differing expectations for the project outcomes and promoting essential software practices, such as documentation and maintainability, often within tight timelines.

        We want to invite fellow RSEs to discuss communication challenges and listen to experiences encountered in their daily work. In our work, we identified some of the following communication hurdles. Firstly, RSEs and scientists have different styles for communicating technical details and expectations. Secondly, different levels of technical knowledge between scientists and RSEs lead to the challenge of bridging gaps in technical understanding, as not all scientists are familiar with the intricacies of software development or coding. But since some scientists are already very familiar with programming topics and their code, they may know more on the topic than the RSE, especially at the beginning of the project.

        While RSEs focus on the sustainability, reusability, and maintainability of software, scientists often prioritize research outcomes. Therefore, when we face conflicting priorities in the project goals, we would discuss and realign the goals with scientists.

        With each project, we are working with different scientists who have diverse personalities—from highly collaborative and communicative to more reserved or independent workers. As a consequence we have to adapt to the individual personalities of scientists. Finally, scientists might have high expectations regarding the speed and outcomes of technical developments during the project, while RSEs are aware of the limitations and complexities of the implementation.

        Over the last 2 years, we have had some on-the-job learnings which could serve as good practices for effective communication during a collaborative project. Defining precise objectives and a detailed action plan at the onset, enables mutual understanding of the project expectations and responsibilities. This clarity is encouraged through initial Kick-Off meetings, designed to establish these benchmarks at the project's inception. Regular project updates help keep all participants informed about the project's progress and foster a proactive communication environment.

        During our meetup session we plan to start with an input talk and then proceed with a discussion following some guiding questions. We aim to discuss with fellow RSEs, exchanging strategies and insights on how to effectively communicate with scientists and strengthen RSE-scientist collaborations across multidisciplinary projects. The outcome of this meetup should be a document with experiences and how we can handle problems, but also reiterate already well established communication practices.

        Speakers: Aleksandar Mitic (DKRZ), Aparna Devulapalli (DKRZ), Joerg Benke
      • 16:45
        The End of RSEng? Challenges and Risks for RSEng 45m

        It is certainly too early to herald the end of RSEng already, but
        the future is in constant flux and we should discuss how RSEng will change in response to the upcoming challenges in an open discussion.
        Some questions to get the discussion started:

        • How will RSE change in the face of the ongoing digitalization of society? If people are better prepared in school?
        • How will AI impact our work
        • What will our topics be after version control doesn’t need to be taught anymore? Digitalisation will also feature more heavily in the domain curricula.
        • What is in this future for RSEs?

        This BoF aims to bring together people interested in reflecting on the challenges and opportunities for RSE.

        Speaker: Florian Goth (Universität Würzburg)
    • 16:00 17:30
      Community in NFDI Audimax A

      Audimax A

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Julian Gethmann (KIT-IBPT)
      • 16:00
        User-centric development of Materials Knowledge Solutions in NFDI-MatWerk 20m

        The long-term preservation and accessibility of research data will accelerate future research. To reduce structural and financial risks in research data management (RDM) in Germany, the National Research Data Infrastructure (German acronym: NFDI) was established for "bundling expertise and creating universal access to services for research data management” (1).

        NFDI-MatWerk is one of 26 domain-specific consortia developing and providing software, workflows, ontologies, and metadata schema, aiming at a materials knowledge system for materials scientists and engineers. Materials Science and Engineering (MSE) plays a key role in addressing global challenges such as climate change, resource scarcity, and the transition to renewable energy. The complexity of materials research due to the different scales and structures of materials, as well as the manifold possibilities of chemical compositions and treatments, make the process of research software solutions development complex. Developing research software that helps making materials data FAIR will change the way how research in MSE is conducted. Therefore, software solutions must meet the needs of MSE researchers.

        To ensure this quality, NFDI-MatWerk uses a user-centered approach to develop MSE-specific research data management software. The project consortium includes institutional computing centers, which not only bring in their know-how and infrastructures but also their already developed services to build further upon. At the same time, the MSE community is involved through pre-existing large MSE projects and working groups, which represent prototypical research data management tasks as Participant Projects (PP). Together with them, infrastructure usage profiles have been developed and consolidated into infrastructure use cases (IUC). The aim of each IUC is to make the solutions available to other researchers in the field who may be working on related research workflows (leading to more requirements). To enable the research data management and further digitalization for these IUCs, we established the following elements:
        - Specific teams are being formed for each IUC with members from the PPs and developers from the NFDI-MatWerk team.
        - Each IUC team is supported by an agile manager who helps defining roles, working packages and supports user-centered development.
        - A product owner from a PP ensures requirements engineering and a general understanding of the researchers´ challenges.
        - To maximize the usability and dissemination of the research software developed, a central roll-out working group is currently establishing roll-out mechanisms, including quality control, accessibility, understandability, documentation, user feedback, teaching and marketing. This allows the developers and domain experts to focus on their core tasks, while fostering a diversity of research software engineering skills within NFDI-MatWerk.

        The collaboration of software development specialists from IT centers and MSE domain experts from participating projects in interdisciplinary use case teams characterizes the software engineering process of NFDI-MatWerk. We will present our learnings about the NFDI-MatWerk user-centered development approach along concrete examples from MSE, emphasizing the impact of the specific development process on the research community.

        (1) Alliance of German Science Organisations. 2010. “Principles for the Handling of Research Data.” Publisher: Alliance of German Science Organisations.

        Speakers: Adina Hofmann (Fraunhofer IWM), Dr Julia Mohrbacher (Albert-Ludwigs-Universität Freiburg)
      • 16:20
        NFDIxCS: Guarantee Levels of the Research Data Management Container (RDMC) 20m

        Effective management of research data and software is essential for promoting open and trustworthy research. Structured methods are needed to ensure that research artifacts remain accessible and easy to locate, in line with the FAIR principles of making research data and software findable, accessible, interoperable, and reusable [1, 2]. However, fully implementing these principles remains challenging.
        Several research data management initiatives, such as the National Research Data Infrastructure (NFDI) and the European Open Science Cloud (EOSC), aim to support a cultural shift towards openness. The NFDIxCS consortium [3], part of the NFDI, has a core mission to develop infrastructure that supports operational services across the diverse Computer Science (CS) field and implement FAIR principles. A central concept of this project is the Research Data Management Container (RDMC) [4], which encapsulates research data, software, and contextual information into a 'time capsule' for long-term archiving and future use. After creating an RDMC, this container will be connected to a Reusable Execution Environment (REE), allowing the time capsule to be unpacked and executed within a predefined environment.
        Creating an RDMC requires a workflow to encapsulate research data, software, its external components, the context, and other related materials into a single container. Based on several personas [5], we have developed a workflow and designed a wizard to facilitate this process. This workflow enables the underlying management platform to create badges that indicate the expected quality of the content. Currently, these badges are referred to as guarantee levels, providing information on aspects such as metadata quality, long-term achievability and sustainability, and privacy of research artifacts.
        In this presentation, we give an introduction into the workflow for creating an RDMC, outline the concept of RDMC guarantee levels, and engage the community in discussing potential shortcomings and challenges in developing these guarantee levels.

        References

        1. Wilkinson, M. D. et al.: The FAIR Guiding Principles for scientific
          data management and stewardship. Scientific Data 1/3, p. 160018,
          2016, DOI: 10.1038/sdata.2016.18
        2. Chue Hong, N. P. et al.: FAIR Principles for Research Software
          (FAIR4RS Principles). Research Data Alliance, 2022. DOI:
          10.15497/RDA00068
        3. Goedicke, M. et al.: National Research Data Infrastructure for and with Computer Science (NFDIxCS). Zenodo, 2024. DOI:
          10.5281/zenodo.10557968
        4. Goedicke, M.; Lucke, U.: Research Data Management in Computer Science - NFDIxCS Approach. Gesellschaft für Informatik, Bonn, 2022. DOI: 10.18420/inf2022_112
        5. Bernoth, J.; Al Laban, F.; Lucke, U.: Utilizing Personas to Create
          Infrastructures for Research Data and Software Management.
          Gesellschaft für Informatik e.V, 2024. DOI: 10.18420/INF2024_180
        Speakers: Safial Islam Ayon (Universität Potsdam), Dr Firas Al Laban (Universität Potsdam)
      • 16:40
        NFDIcore: A BFO Compliant Ontology for Cross-Domain Research Data and its Related Modular Domain Ontologies for NFDI4Culture and NFDI-MatWerk 20m

        Each NFDI consortium works on establishing research data infrastructures tailored to its specific
        domain. To facilitate interoperability across different domains and consortia, the NFDIcore
        ontology was developed and serves as a mid level ontology for representing metadata about
        NFDI resources such as individuals, organizations, projects, and data portals [1]. The NFDIcore
        ontology has been created to provide a structured framework that enables efficient
        management, organization, and interconnection of research data across various disciplines. By
        adhering to established data standards, the ontology facilitates the accessibility, sharing, and
        reuse of research data in a consistent and sustainable manner. NFDIcore is built upon the Basic
        Formal Ontology (BFO) and contains mappings to further standards, e.g. schema.org. To
        address domain-specific research questions, NFDIcore serves as the basis for various
        application and domain ontologies, which extend its core structure in a modular fashion.
        Examples include the NFDI4Culture Ontology (CTO)[2], NFDI MatWerk Ontology (MWO)[3],
        NFDI4Memory Ontology (MEMO), and NFDI4DataScience Ontology (DSO)[4], each tailored to
        specific research fields while ensuring semantic interoperability. CTO is designed to represent
        and categorize resources within the NFDI4Culture domain, which encompasses five academic
        disciplines: Architecture, Musicology, Art History, Media Science, and the Performing Arts. CTO
        defines classes and properties that address domain-specific research questions, connect
        diverse cultural entities, and facilitate the efficient organization, retrieval, and analysis of cultural
        data. Also with regard to research data in the domain of Materials Science and Engineering
        (MSE), the MWO addresses several key aspects. It focuses on the NFDI MatWek community
        structure, encompassing task areas, infrastructure use cases, participating projects,
        researchers, and organizations. Additionally, it describes various NFDI resources, including
        software, workflows, ontologies, publications, datasets, metadata schemas, instruments,
        facilities, and educational resources. Furthermore, the MWO represents NFDI MatWerk services
        and highlights related academic events, courses, and international collaborations.
        [1] https://ise-fizkarlsruhe.github.io/nfdicore/docs/
        [2] https://gitlab.rlp.net/adwmainz/nfdi4culture/knowledge-graph/culture-ontology
        [3] https://nfdi-matwerk.pages.rwth-aachen.de/ta-oms/mwo/docs/index.html
        [4] https://arxiv.org/abs/2408.08698

        Speakers: Hossein Beygi Nasrabadi (FIZ Karlsruhe – Leibniz-Institute for Information Infrastructure), Tabea Tietz (FIZ Karlsruhe – Leibniz-Institute for Information Infrastructure)
      • 17:00
        Advancing Digital Transformation in Material Science: The Role of Workflows within the MaterialDigital Initiative 20m

        The MaterialDigital initiative represents a major driver towards the digitalization of material science. Next to providing a prototypical infrastructure required for building a shared data space and working on semantic interoperability of data, a core focus area of the Platform MaterialDigital (PMD) is the utilisation of workflows to encapsulate data processing and simulation steps in accordance with FAIR principles. In collaboration with the funded projects of the initiative the workflow working group strives to establish shared standards, enhancing the interoperability, and reusability of scientific data processing steps. Central to this effort is the Workflow Store, a pivotal tool for disseminating workflows with the community, facilitating the exchange and replication of scientific methodologies. This paper discusses the inherent challenges of adapting workflow concepts, by providing the perspective on developing and using workflows in the respective domain of the various funded projects. Additionally, it introduces the Workflow Store's role within the initiative and outlines a future roadmap for the PMD workflow group, aiming to further refine and expand the role of scientific workflows as a means to advance digital transformation and foster collaborative research within material science.

        Speaker: Joerg Schaarschmidt (Karlsruher Institute of Technology)
    • 16:00 16:45
      How to compute a special function with near machine-precision accuracy 45m Seminarroom 006 (Building 30.96)

      Seminarroom 006

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      Based on my experience as developer and maintainer of some numerical open-source libraries (libcerf, libkww, libformfactor), I will explain key concepts for writing code that computes a special function or integral with high accuracy and high speed.

      • Choose different numerical algorithms for different argument regions.
      • Don't be afraid of divergent series or ill-conditioned recursions.
      • Confine Chebyshev fits to small subregions.
      • Use code instrumentation and bisection to ensure continuity where the algorithm changes.
      • Beware of literature that is only concerned with truncation. Near machine precision, cancellation is the bigger problem. Visualization may reveal the difference.
      • Never rely on non-standard facilities. "Long double" makes no sense if it is not longer than "double"? Tell Apple.
      • Generate test references and hard-coded coefficients with high-precision scripts (e.g. mpmath based).
      • Be graceful with relative accuracy measures near zeros and for real or imaginary parts of complex numbers.
      • Don't trust any performance measure you haven't tweaked yourself. Take caching into account.
      Speaker: Joachim Wuttke (Forschungszentrum Jülich GmbH - JCNS at MLZ Garching)
    • 16:00 17:30
      Open Source Community Building SR A+B

      SR A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Inga Ulusoy (University of Heidelberg)
      • 16:00
        Effective workflows for community engagement of users and developers in open-source software development 20m

        Open-source software development has become a fundamental driver for innovation in both academia and industry, fostering transparency and enabling collaboration among individuals who may not have formal training in computer science. Academic researchers benefit from open-source collaboration in several aspects: (1) User engagement in feature development and interfaces, enhancing the (re-)usability of the software; (2) developer engagement in software maintenance and feature development, thus gaining contributors and improving developer workload balance; (3) promotion of academic and scientific output, hence boosting visibility and impact; (4) higher research excellence in harnessing collective intelligence and diverse contributors. However, despite these numerous advantages, efficient collaboration remains challenging—particularly for new contributors who often encounter incomplete documentation, undefined requirements, or lack of a structured workflow that keeps the projects alive in the long term.

        With this in mind, this contribution presents a structured methodology to optimize used feedback workflows and foster successful community engagement in open source projects. The main goal is to maintain a balance between Software Engineering good practices, flexibility in collaboration, and efficient use of resources (people, open-source tools, documentation, etc.). The primary example for this methodology is the BioCypher framework, an open-source initiative designed to unify biomedical data and provide insightful knowledge graphs for researchers in biomedical science and beyond.

        Our methodology outlines three primary phases: onboarding and knowledge transfer, structured contribution, and ongoing community engagement.

        The onboarding and knowledge transfer phase is essential in familiarizing new contributors with the existing project ecosystem. This phase includes communicating the project’s goals, reviewing its roadmap (milestones), and familiarization with coding and documentation standards. We propose structured templates for tutorials, documentation, issues, and discussions, allowing all the participants to talk in a clear and simple language while contributing.
        In the structured contribution phase, the focus shifts to managing the technical aspects of contributions and adhering to GitHub-specific workflows, such as branching strategies, commit messaging conventions, and pull request (PR) protocols. Additionally, automated tools, such as GitHub Actions, are used to automatically validate and test code before it is reviewed, ensuring contributions meet project standards.

        The ongoing community engagement phase focuses on sustaining long-term involvement in the project, meaning that documentation should be sustainable and maintained to engage all new contributors and users. At the same time, outreach is a key activity to ensure that the tool is known by the community and remains useful.

        Speaker: Edwin Carreño (Scientific Software Center - University of Heidelberg)
      • 16:20
        Lessons Learned from Organizing Community Workshops for Open-Source Tools 20m

        We'll explore the key lessons learned from planning and running several community workshops aimed at fostering the adoption and use of open-source tools, in this case the models REMIND and MAgPIE and their encompassing open source ecosystem. Our goals were to educate interested people who had no prior experience, deepen the understanding and capabilities of users already familiar with the tools, gather feedback, and build a stronger community around our tools. Throughout the planning and the event itself, we encountered numerous logistical and technical challenges, as well as difficulties related to tailored knowledge transfer in heterogeneous groups, which led to valuable insights that can help improve future similar workshops. Further we provide some guidance how others can organize a community workshop centered around their open source tool, and what key questions have to be considered.

        Speakers: Alex Hagen (Potsdam Imstitute of Climate Impact Research), Pascal Sauer (Potsdam Institute for Climate Impact Research)
      • 16:40
        Organizing successful software community workshops 20m

        Have you developed an open source scientific software and it has now become popular? Congratulations! Your software has entered a new phase of its life cycle, and you are now a community manager. Your new role includes: training the next generation of users, identifying and converting power users into contributors, fostering networking opportunities, and making your software visible to a wider audience.

        You are considering organizing a 3-day workshop, summer school or user meeting to gather your community in a physical location where people can exchange ideas, join scientific collaborations, discover new applications for their favorite software, and play a role in the governance of your software project. But how much effort is it? How do you fund this event? How do you advertise it? How do you provide incentives for people to attend? Whom to invite as speakers? How to strike the right balance between talks, hands-on sessions and hackathons? Is online/hybrid even an option?

        We will answer these questions through two success stories: the ESPResSo summer school1, and the preCICE workshop2, organized annually since 2006 and 2020, respectively. Both events attract ~50 people every year with a budget under €10,000. They combine lectures, hands-on sessions, poster sessions and user support sessions to train newcomers and seasoned users alike.

        ESPResSo summer schools are organized as CECAM Flagship Schools and yield ECTS points as part of the University of Stuttgart curriculum. Participation fees are waived thanks to RSE grants3 and SimTech. Core lessons teach algorithms for soft matter physics using the ESPResSo software, while posters and scientific talks help connect with scientists from other software communities. Teaching material is hosted on the CECAM platform, recorded lectures are available on YouTube, and Jupyter notebooks4 are remotely executable on the Binder platform5.

        preCICE workshops cover parts of their costs via project funding and get support from local scientific organizations for managing registration and finances. The schedule encourages partial attendance, allowing seasoned users to focus on project updates. A structured course offers new users a starting point, while user support sessions help plan next steps together with the developers. A world café collects user feedback and discusses future directions, while posters and scientific talks allow users to present their applications and new methods. Recorded talks are available on YouTube and the community engages on the preCICE forum.

        References

        1. Weik et al., ESPResSo 4.0 – An extensible software package for simulating soft matter systems, European Physical Journal Special Topics, 2019, doi:10.1140/epjst/e2019-800186-9.
        2. Chourdakis et al., preCICE v2: A sustainable and user-friendly coupling library [version 2; peer review: 2 approved], Open Research Europe, 2022, doi:10.12688/openreseurope.14445.2.
        3. Katerbow et al., Handling of research software in the DFG’s funding activities, German Research Foundation, 2024, doi:10.5281/zenodo.13919790.
        4. Kluyver et al., Jupyter Notebooks–A publishing format for reproducible computational workflows, Positioning and Power in Academic Publishing: Players, Agents and Agendas, 2016, doi:10.3233/978-1-61499-649-1-87.
        5. Project Jupyter et al., Binder 2.0 - Reproducible, interactive, sharable environments for science at scale, Proceedings of the 17th Python in Science Conference, 2018, doi:10.25080/Majora-4af1f417-011.
        Speaker: Dr Jean-Noël Grad (University of Stuttgart)
      • 17:00
        Building a community around your Open Source research software - collected thoughts from deRSE24 20m

        After a lively and productive meet-up “Building a community around your Open Source research software” at the deRSE2024, we summarized and clustered all your input. Now at the deRSE2025 will present the key findings regarding the following questions:

        1. How to prepare research software for third-party users/developers?
        2. How to attract new third-party users/developers?
        3. How to get third-party users into collaborative development?
        4. How to balance growing demand for support with obligations for own projects?

        During our presentation we will be looking at the role of documentation, contribution guidelines and release management as well as accessibility, demonstrations and trainings. The collected experiences on establishing communication channels, facilitating collaboration platforms and creating ownership will provide ideas/proposals for other researchers building up a community around their Open Source research software. In addition, we will discuss different user groups and the product vs. project perspective.
        After the presentation we will be happy to answer your questions, get your feedback and to discuss with you the next steps to take with this collaboratively collected knowledge.

        Speakers: Jan Philipp Dietrich (Potsdam Institute for Climate Impact Research (PIK)), Lavinia Baumstark
    • 16:00 17:30
      Reproducibility and Discovery of Research Software Room 206 (Building 30.70)

      Room 206

      Building 30.70

      Straße am Forum 6, 76131
      Convener: Bernadette Fritzsch
      • 16:00
        Fine-grained exploration of the reproducibility of research-related Jupyter notebooks at scale 20m

        Jupyter notebooks have revolutionized the way researchers share code, results, and documentation, all within an interactive environment, promising to make science more transparent and reproducible. In research contexts, Jupyter notebooks often coexist with other software and various resources such as data, instruments, and mathematical models, all of which may affect scientific reproducibility. Here, we present a study that analyzed the computational reproducibility of 27,271 Jupyter notebooks from 2,660 GitHub repositories associated with 3,467 biomedical publications (https://doi.org/10.1093/gigascience/giad113). The resulting reproducibility data were loaded into a knowledge graph --FAIR Jupyter-- that allows for a highly granular exploration and interrogation.

        The FAIR Jupyter graph is accessible via https://w3id.org/fairjupyter and described in a preprint available at https://doi.org/10.48550/arXiv.2404.12935 . It contains rich metadata about the publications, associated GitHub repositories and Jupyter notebooks, and the notebooks' dependencies and reproducibility. Through a public SPARQL endpoint, it enables detailed data exploration and analysis by way of queries that can be tailored to specific use cases. Such queries may provide details about any of the variables from the original dataset, highlight relationships between them or combine some of the graph's content with materials from corresponding external resources.

        We provide a collection of example queries addressing a range of use cases in research software engineering and education. We also outline how sets of such queries can be used to profile specific content types, either individually or by class. We conclude by discussing how such a semantically enhanced sharing of complex datasets can both enhance their FAIRness i.e., their findability, accessibility, interoperability, and reusability, and help identify and communicate best practices, particularly with regards to the quality, standardization and reproducibility of research-related software and scripts.

        Speaker: Dr Daniel Mietchen (FIZ Karlsruhe — Leibniz Institute for Information Infrastructure, Germany)
      • 16:20
        Seek and You Shall Find - Or Not! Why Can't We Find the Research Software We Really Need? 20m

        Software discovery is a crucial aspect of research, yet it remains a challenging process due to various reasons: The lack of a centralized or domain-tailored search and publication infrastructure, insufficient software citations, the prevailing unavailability of software (versions) and many others. Researchers tend to utilize general search engines and their social network before considering code repositories, (text + data) repositories, and package management platforms, among other locations, to find the software they need. The resulting fragmented ecosystem is characterized by parallel developments from different, yet partially overlapping, redundant and non-interoperable infrastructure providers and research communities. Moreover, the discovery process is further complicated by missing or imperfect metadata, which can lead to limited search results. To address these challenges, it is essential to gain a deeper understanding of the different software publication and discovery systems.

        In our talk, we will describe available discovery options, including code and publication repositories, domain, geographic, or institution-specific catalogs, classical search engines, curated lists, knowledge graphs, social networks, and all of these in various combinations, with and without the use of artificial intelligence. Besides characterizing each option, we will present examples, challenges and recommendations for an improved software discovery process. In addition, we will discuss the role of different stakeholders (e.g. developers, users, funders, publishers) and what they could do for better findability.

        Our talk contributes to a systematic understanding of the software discovery landscape, technical shortcomings and their potential solutions. We envision valuable insights for researchers, infrastructure providers, and policymakers by identifying and comparing the different options for research software discovery.

        Speaker: Ronny Gey (Helmholtz Centre for Environmental Research - UFZ)
      • 16:40
        Reproducible scientific simulations on the blockchain 20m

        The reproducibility of scientific simulations is one of the key challenges of scientific research.

        Current best practices involve version-controlled code, tracking dependencies, specifying hardware configurations, and sometimes using Docker containers to enable one-click simulation setups. However, these approaches still fall short of achieving true reproducibility. For example, Docker depends on the underlying host kernel, and high-performance computing (HPC) codes often link with specific kernel modules and headers. Over time, changes in host kernel versions can render Dockerized simulations unusable. Furthermore, non-deterministic simulations, such as Monte Carlo methods, may not yield identical results even when rerun on the same hardware with the same code.

        This talk explores the potential of blockchain technology to address these challenges. By running simulations natively on-chain (via smart contracts) and emitting logs of each state transition, we can achieve reproducibility while also verifying the simulation's authenticity (associating the original author of the simulation and the reporting author).
        Other potential ideas include using zero-knowledge proofs to hash the call stack and the stack memory into a Merkle tree or also to think about the tokenisation of compute.

        We will delve into the technical feasibility and potential benefits of this approach, including its implications for trust, transparency, and the future of scientific research.

        Speaker: Ashwin Kumar Karnad
      • 17:00
        The German Reproducibility Network - a strong network to implement Open Science practices in Germany 20m

        In this contribution, we introduce the German Reproducibility Network (GRN) to the Open Source and Research Software Engineering community. The GRN aims to increase trustworthiness, transparency, and reproducibility in scientific research in Germany and beyond, as part of a broader international network of reproducibility initiatives. Since its founding in 2020, the GRN has grown into a cross-disciplinary network comprising over 42 members, including local reproducibility initiatives led by early career researchers, academic institutions, and scholarly societies.
        Recognizing the integral role of open source software in open science and reproducible research, the GRN advocates for the development, adoption, and dissemination of open source tools and practices. Advancing research improvement strategies, such as sustainable software development, requires collaboration and knowledge exchange among those committed to transforming the research ecosystem.
        We highlight how members of the Open Science community can engage with the GRN and benefit from shared experiences in driving academic reform. The GRN aims to serve as a key player in the German academic system, rallying support for initiatives that enhance research practices. Activities include sharing best practices in training and education, incentivizing reproducible research, and fostering open science at academic institutions. For example, in 2023, the GRN issued a press statement advocating for better working conditions in academia and stronger support for open science in the context of the WissZeitVG reform. Recognizing the critical role of open source and reproducible code in research, we organized a session titled "FAIR and Reproducible Code" at last years deRSE24 fostering discussions on best practices for transparent research software. Our commitment to supporting early career researchers (ECRs) led us to conduct a Summer School, complemented by an accompanying webinar series, equipping participants with essential skills for adopting reproducible methods and develop their own software tools for a transparent workflow. Furthermore, we authored a community paper providing a practical framework for establishing open science practices, aiming to empower researchers and institutions to adopt transparency and reproducibility as standard practices.
        Since the deRSE25 attracts scientists from many different disciplines, we believe that the introduction of a cross-disciplinary network such as the GRN is a valuable contribution for this conference.

        Speaker: Maximilian Frank (LMU)
    • 16:00 17:30
      SE Requirements Vortragsraum (Building 30.51 (Bibliothek))

      Vortragsraum

      Building 30.51 (Bibliothek)

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Andreas Vogelsang
      • 16:00
        Benchmarking Requirement Template Systems 22m
        Speakers: Katharina Großer (Universität Koblenz), Amir Shayan Ahmadian (Universität Koblenz), Marina Rukavitsyna (Universität Koblenz), Qusai Ramadan (Universität Koblenz), Jan Jürjens (Fraunhofer Institute for Software & Systems Engineering ISST and University of Koblenz)
      • 16:45
        Requirements Classification for Traceability Link Recovery 22m

        This contribution is an extended abstract of the paper originally published in the proceedings of the 2024 IEEE 32nd International Requirements Engineering Conference (RE). The paper assesses the potential of requirements classification approaches to identify parts of requirements that are irrelevant for automated traceability link recovery between requirements and code. We were able to show that automatic identification of parts of requirements that do not describe functional aspects can significantly improve the recovery performance and that the parts can be identified with an F1-score of 84%

        Speakers: Tobias Hey (Karlsruhe Institute of Technology (KIT)), Jan Keim (Karlsruher Institut für Technologie (KIT)), Sophie Corallo (Karlsruhe Institute of Technology (KIT))
      • 17:07
        Explanations in Everyday Software Systems: Towards a Taxonomy for Explainability Needs 22m
        Speakers: Jakob Droste (Leibniz Universität Hannover), Hannah Deters (Leibniz Universität Hannover), Martin Obaidi (Leibniz Universität Hannover), Kurt Schneider (Leibniz Universität Hannover)
    • 16:00 17:30
      SE Software Evolution Audimax B

      Audimax B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Wilhelm Hasselbring ((Kiel University))
      • 16:00
        Testability Refactoring in Pull Requests: Patterns and Trends 22m
        Speakers: Pavel Reich (University of Hamburg), Walid Maalej (University of Hamburg)
      • 16:22
        Towards Semi-Automated Merge Conflict Resolution: Is It Easier Than We Expected? -- Summary 22m
        Speakers: Alexander Boll (University of Bern), Yael van Dok (University of Bern), Manuel Ohrndorf (University of Bern), Alexander Schultheiß (Paderborn University), Timo Kehrer (University of Bern)
      • 16:45
        Mining Domain-Specific Edit Operations from Model Repositories with Applications to Semantic Lifting of Model Differences and Change Profiling -- Summary 22m
        Speakers: Christof Tinnes (Siemens AG), Timo Kehrer (University of Bern), Mitchell Joblin (Saarland Informatics Campus), Uwe Hohenstein (Siemens AG), Andreas Biesdorf (Siemens AG), Sven Apel (Saarland Informatics Campus)
      • 17:07
        EditQL: A Textual Query Language for Evolving Models 22m
        Speakers: Jakob Pietron (Ulm University), Benedikt Jutz (University of Ulm), Alexander Raschke (Ulm University), Matthias Tichy (Ulm University)
    • 16:00 17:30
      SE Student Research Competition: Presentations Seminarraum 17 (Building 30.48 (MZE) )

      Seminarraum 17

      Building 30.48 (MZE)

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Hamideh Hajiabadi ((Karlsruher Institut für Technologie))
      • 16:00
        Gamification of Student Development Projects in Software Engineering Education 20m
        Speaker: Paul Bredl
      • 16:20
        Error categorization in novice code 20m
        Speaker: Nadja Just
      • 16:40
        Requirements Classification for Requirements Reuse 20m
        Speaker: Julia Märdian
      • 17:00
        A Graph-Based Query Language For Breakpoints 20m
        Speaker: Freya Dorn
    • 16:45 17:30
      How to achieve FAIR software publications with HERMES 45m Seminarroom 006 (Building 30.96)

      Seminarroom 006

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      RSEs are required to publish reproducible software to satisfy the FAIR for Research Software Principles. To save RSEs the arduous labor of manual publication of each version, they can use the tools developed in the HERMES project. HERMES (HElmholtz Rich MEtadata Software Publication) is an open source project funded by the Helmholtz Metadata Collaboration. The HERMES tools help users automate the publication of their software projects and versions together with rich metadata. They can automatically harvest and process quality metadata, and submit them to tool-based curation, approval and reporting processes. Software versions can be deposited on publication repositories that provide PIDs (e.g. DOIs).

      In this SkillUp, we explore the publication workflow. We will guide through the requirements of FAIR software with an example and best practices. We demonstrate HERMES as a tool to simplify these processes. We teach RSE participants to set up the HERMES publication workflow for their own software projects. During the deRSE24 we already held a workshop to introduce HERMES and its possibilities. This time we can present the new feature "hermes init" that reduces the workload and the proneness to errors. Therefore participants can easily follow and independently integrate HERMES for their own projects in the future.

      The workflow follows a push-based model and runs in continuous integration (CI) infrastructures such as GitHub Actions or GitLab CI. This gives users more control over the publication workflow compared to pull-based workflows (e.g. the Zenodo-GitHub integration). It also makes them less dependent on third-party services. Rich descriptive metadata is the key element to useful software publications. The workflow harvests existing metadata from source code repos and connected platforms. Structured metadata could for example come from a Citation File Format file or a CodeMeta file. Unstructured data could be found everywhere, especially in the code or the README file. HERMES processes, collates and optionally presents the gathered data for curation to keep a human in the loop. In curation, output can be controlled and errors reduced. After approval, HERMES prepares the metadata and software artifacts for automatic submission to FAIR publication repositories.

      In the course of the SkillUp, RSEs are enabled to employ HERMES for their own projects through following a live coding session on an example project. We will address any problems that arise along the way and help participants solve them. Finally, we will discuss potential improvements of the HERMES workflow based on the hands-on experience participants made.

      The SkillUp should last about 60 min. The target audience is everyone who deals with research software. Researchers, developers, curators and supervisors are welcome as well as everyone interested. No specific expertise or previous experience is needed. We work with GitHub or GitLab, and use their continuous integration tools, so some previous experience with these platforms may be helpful.

      Speakers: Nitai Heeb (Forschungszentrum Jülich), Sophie Kernchen
    • 17:30 18:00
      Break 30m
    • 17:30 18:15
      Fachgruppe Architekturen: Fachgruppensitzung Seminarraum 17 (Building 30.48 (MZE))

      Seminarraum 17

      Building 30.48 (MZE)

      Straße am Forum 7, 76131 Karlsruhe
    • 18:00 20:00
      Poster and Demo Session together with Reception Audimax Foyer

      Audimax Foyer

      Building 30.95

      Str. am Forum 1, 76131 Karlsruhe

      Our Poster and Demo Session. Will take place in conjunction with the evening reception of SE25.

      • 18:00
        de-RSE the society 20m

        The society wants to have a poster. And remember that the society has a 10 minute talk roundabout the first keynote.

        Speaker: Jan Linxweiler (Technische Universität Braunschweig)
      • 18:20
        Retrieval-Augmented Generation application with scientific material science papers 20m

        Rapid and precise knowledge retrieval is essential to support research in exact sciences like material science, thus optimising time management and enhancing research efficiency. Having a database of over 2,500 materials science research papers, an automated method for reliably and effectively accessing and querying this repository is necessary.
        Here we show, a Retrieval-Augmented Generation (RAG) application which can be used to query this database and provide the output in form of an answer in natural language. The application features a top-performing retriever sourced from the MTEB leaderboard for retrievers in Hugging Face, further finetuned to gain domain knowledge with GPL algorithm using material science literature. The generation commonent supports GGUF models via llama.cpp and integrated Hugging Face-compatible models including Meta’s Llama-2-7b-chat. The Haystack framework is used to build strong pipelines for query handling, while for effective PDF parsing the system uses Unstructured.io to guarantee thorough data extraction. The application's three main features include searching the database of publications, querying documents that users have uploaded on the application, and performing web search by extending queries to Google Scholar. Users can engage with the application via a web interface or command-line tools.
        By employing Retrieval-Augmented Generation, the application enables users to query the database in natural language and obtain factual, focused and contextually grounded responses based on original study papers. It speeds the process of finding information by quickly pinpointing to the context the user is looking for, thus greatly accelerating the knowledge retrieval process as a whole. This method increases research productivity, helps researchers save time and facilitates more efficient knowledge discovery in the material science domain.

        Speaker: Jehona Kryeziu
      • 18:40
        Nano Energy System Simulator NESSI 20m

        Increasing energy demand and the need for sustainable energy systems have initiated the global and German energy transition. The building and mobility sectors promise high potential for savings in final energy and greenhouse gas emissions through renewable energy technologies. NESSI was developed to reduce the complexity of decisions for an efficient, resilient, affordable, and low-emission energy system. The flexible simulation and analysis software for decentralized energy systems in buildings and neighborhoods simulates thermal and electrical energy flows.
        NESSI is a free online web tool for developing baseline and comparative scenarios for transforming an ecologically, economically, and socially sustainable energy system. NESSI is open access and intuitive to use, even for non-energy experts. Scenarios can be saved, loaded, and compared. Our Nano Energy System Simulator NESSI extends energy research software to provide a decision support system for buildings and neighborhoods. NESSI enables the quantification of the environmental, economic, and social impacts of an individual energy system, helping to identify the right energy system for a location. This applies to new buildings and the transformation of existing buildings. Based on this, transformation strategies can also be formulated, thus supporting the local, German, and global energy transition.
        With a rule-based energy management system, NESSI simulates hourly electrical and thermal energy flows in buildings and neighborhoods. The energy system components include various technologies for generating and consuming thermal and electrical energy. The loads are aggregated at the building or neighborhood level and covered by the selected components of this infrastructure in a predetermined order. NASA Merra 2 weather data is used to calculate the yields of photovoltaic and wind power plants. In general, user-friendliness is at the center of NESSI's development. This is facilitated by a coherent, straightforward user interface design. Predefined scenarios are offered as templates, and a progress bar guides users through the simulation steps. NESSI is adaptable to all screen sizes, and offers help texts for all input fields. Users can access pre-generated load profiles for households and businesses when simulating energy system scenarios. Demand data can be uploaded if no load profile meets the user's requirements.
        NESSI has been evaluated and adapted several times in recent years, so constant iterations deliver the best possible result. Publications in scientific conferences and journals have ensured our simulator's user-oriented and scientifically valuable aspect.

        Sarah Eckhoff, Maria C.G. Hart, Tim Brauner, Tobias Kraschewski, Maximilian Heumann, Michael H. Breitner (2023): Open Access Decision Support for Sustainable Buildings and Neighborhoods: The Nano Energy System Simulator NESSI, Building and Environment 110296
        Maria C.G. Hart, Sarah Eckhoff, Michael H. Breitner (2023): Sustainable Energy System Planning in Developing Countries: Facilitating Load Profile Generation in Energy System Simulations, Proceedings of the Hawaii International Conference on System Sciences (HICSS), Maui 2023
        Maria C.G. Hart, Sarah Eckhoff, Ann-Kristin Schäl, Michael H. Breitner (2023): Threefold Sustainable Neighborhood Energy Systems: Depicting Social Criteria in Decision Support Systems, Proceedings of the American Conference on Information Systems (AMCIS), Panama City 2023, Best Complete Paper Award Winner

        Speaker: Ms Sarah K. Lier (Institut für Wirtschaftsinformatik, Leibniz Universität Hannover)
      • 19:00
        Bridging Repositories, ELNs and Semantic Data Management: A LinkAhead-based use case for 3D Additive Manufacturing 20m

        Interdisciplinary collaborative scientific networks often rely on a multitude of different software systems for data storage and data exchange. Keeping data findable and in sync between different sites, working groups and institutes can be challenging. We developed a solution based on the open source software LinkAhead that combines meta data from different repositories into a single research data management system (RDMS).
        The meta data import tool was created using the extendable crawler framework provided by LinkAhead. This enables us to import meta data from four different repositories and ELN systems used by the Cluster of Excellence 3D Matter Made to Order (3DMM2O).
        In addition, the RDMS is built in a way that data models and crawler definitions can also be extended and adapted to future requirements by the researchers at any time. Its basic functionality contains a graphical web-interface, as well as an API for automated queries which provides intuitive searching and querying and using meta data from all linked systems.

        Speakers: Dr Alexander Schlemmer (IndiScale GmbH, Göttingen), Dr Florian Spreckelsen (IndiScale GmbH, Göttingen)
      • 19:20
        COPO: A Collaborative Platform for FAIR Metadata in Omics Research 20m

        The Collaborative OPen Omics (COPO) is a data and metadata broker that advances open science by supporting the principles of Findability, Accessibility, Interoperability, and Reuse (FAIR). As reliance on shared data grows, COPO addresses metadata management challenges by using community-sanctioned standards, specifically Darwin Core (DwC) and Minimum Information about any Sequence (MIxS). These standards enable discoverability and reusability of diverse omics data across platforms, which is especially relevant for the life sciences.

        COPO streamlines and validates metadata submissions (e.g. samples, reads, assemblies and sequence annotations) through user-friendly interfaces, ensuring consistency and high data quality. Data can be accessed through well-defined application programming interface (API) endpoints, with outputs available in Comma-separated values (CSV), Research Object Crate (RO-Crate) and JavaScript Object Notation (JSON) formats, supporting versatile data use and integration. COPO also safeguards personal data, excluding information such as ORCID identifiers and email addresses from API results, which supports compliance with General Data Protection Regulation (GDPR) and ensures researcher privacy. Leveraging the European Nucleotide Archive (ENA) as a primary data repository, COPO enhances interoperability with databases like BioSample at National Centre for Biotechnology Information (NCBI), fostering scientific collaboration and accessible research findings.

        Through the integration of DwC and MIxS standards, COPO enhances metadata structure and context, aiding in data discoverability. The platform incorporates Tree of Life (ToL) projects, enabling users to specify details such as sample locations, collection dates and taxonomy in well-defined spreadsheets or web forms. This information is then mapped to DwC standards for API outputs, ensuring interoperability and consistency. Similarly, MIxS standards can be used ti output the minimum sample information, including environmental context and experimental conditions, ensuring that metadata aligns with community norms.

        To improve scalability, reproducibility, and usability, COPO employs modern deployment tools like Docker, whose containerisation enables consistent deployment of its API endpoints and metadata management tools across various environments, reducing complexities in installation and version control.

        In summary, COPO represents a substantial advancement in omics data management and dissemination. By adhering to FAIR principles, implementing recognised standards, protecting sensitive information, and utilising advanced technologies, COPO strengthens research interoperability and supports a collaborative open science culture. This platform empowers researchers to document and share findings effectively, advancing biological sciences and facilitating future discoveries.

        Speakers: Ms Aaliyah Providence (Earlham Institute), Ms Debby Ku (Earlham Institute)
      • 19:40
        3-D weather forecast visualizations generated with open-source research software and based on open data 20m

        Recent developments in open data policies of meteorological agencies have much expanded the set of up-to-date weather observation and forecast data that is publicly available to meteorological research and education. To improve use of this open data, we have developed 3-D visualization products that extract and display meteorological information in novel ways. In this demo, we present visualization products derived from publicly available data from operational agencies including the German Weather Service (DWD) and the European Centre for Medium-Range Weather Forecasts (ECMWF). Visualizations are created with the open-source, interactive, 3-D visualization research software “Met.3D” (https://met3d.readthedocs.org). Met.3D has primarily been developed for rapid exploration of gridded atmospheric data by interactive means and has recently been extended with capabilities for batch-creation of visualizations and animations. In this demo, we show how we generate daily 3-D movies of current weather data for use in teaching and research, and how the Met.3D research software can be used to further explore data of interest in an interactive way.

        Speaker: Christoph Fischer (Visual Data Analysis Group, Hub of Computing and Data Science, Universität Hamburg)
      • 19:40
        A Description Framework for Research Software and Metadata Publication Policies 20m

        The curation of software metadata safeguards their quality and compliance with institutional software policies. Moreover, metadata that was enriched with development and usage information can be used for evaluation and reporting of academic KPIs. Software CaRD ("Software Curation and Reporting Dashboard"; ZT-I-PF-3-080), a project funded by the Helmholtz Metadata Collaboration (HMC), develops tools to support the curation and reporting steps of the research software publication process. The dashboard will present metadata collected by the HERMES workflow in a graphical user interface, assess compliance with a configurable set of policies, and highlight issues and breaches. It will be usable both standalone and in a CI/CD context.

        As a first step in the project, and as a foundation for the curation dashboard, a description format for software publication policies had to be developed. Our solution takes an approach that allows for configuration at different levels of abstraction: Low level building blocks describe metadata (e.g., CodeMeta) validation in terms of the Shapes Constraint Language (SHACL). A higher-level configuration language allows users to reuse and parameterize these components. This makes Software CaRD usable for RSEs, management, and policy makers, and it allows for customization that facilitates usage in different research institutions.

        This poster submission presents our approach, showcases example policies, and gives guidance to users of the application.

        Speaker: David Pape (Helmholtz-Zentrum Dresden - Rossendorf)
      • 19:40
        A year of Progress in Machine-assisted Refactoring: what's new in Coccinelle 20m

        The Coccinelle project was established to ease maintenance of the Linux kernel driver code, written in the C programming language.
        Nowadays Coccinelle belongs to the toolkit of the Linux kernel maintainers.
        We are working to enable another ambitious goal -- that of large-scale code refactoring, with HPC and C++ in mind.
        This poster tells last year's progress of our collaboration, evidencing new features and new usages of Coccinelle.
        This poster may interest users of C, C++, but also of any other language interacting with these.

        Speaker: Michele Martone (Leibniz Supercomputing Centre)
      • 19:40
        Agile and optimized methodology for VENQS - an automated, cross-platform, modules versioning management system for RSE 20m

        Code development and maintenance in a team can be a daunting process especially when multiple modules are interconnected with variegated dependencies, dispersed over a few git repositories and/or developed in different versions of the software. Consequently, VENQS is established to set up an infrastructure and workflow for a semi-automated version and dependency management. This is achieved with a desktop application, with a simple GUI and a repository system in Gitlab, that allows the build and download of a package of modules selected by the user. Here each module is inter-compatible and has no dependency issues. This enables users to share their configuration and initialize a software project for new members in a drastically shorter time.

        Erroneous pull requests from a wrong branch or the misplacement of modules in the directory are a few common problems in research software development. All resulting in unexpected behavior, like unreasonable results, code not working or behaving differently on other systems. VENQS is developed to solve such problems, which increase with the projects size, operating system changes and software updates. Additionally, users might become contributors by creating their own modules, which need to be incorporated. Therefore, the storage of information about module dependencies and compatibilities becomes significant, requiring a versioning methodology. Subsequently, making standardization for a modules’ structure and documentation essential. Such issues can partially be solved with git repositories by using tags and maintaining a list of compatible module versions. However, this will not resolve the tedious and error prone task of setting up a project. In our use case, the modules are developed in Simulink, a MATLAB-based graphical programming environment, to provide a library for the simulation of the orbit propagation of satellites. VENQS is developed with other languages in mind like C++ and C which are also applicable in Simulink as S-functions.

        Standardization of modules makes metadata easily accessible for VENQS, allowing for automation of tests and source code compilation. Furthermore, generating the advantage of saving legacy code in a well-documented and structured environment allowing for smoother development in the future.

        Versioning is managed for modules and the packages of modules. While modules get stable releases after a development cycle, packages are versioned to handle compatibility inter-modules and between two packages from differing repositories. Each package contains metadata regarding software version and operating system, which is used in VENQS to display available packages for a given system. This enables selection of inter-compatible modules for users, while VENQS handles all necessary dependency selections automatically using the metadata. Thereafter, VENQS loads and constructs the necessary folder structure and module setups on a given system and leaves a YAML file containing information on the module configuration, which is sharable, allowing set-up in another system by only providing the YAML file.

        Speaker: Mike Pfannenstiel (DLR - Institute for Satellite Geodesy and Inertial Sensing)
      • 19:40
        Assembling catalogs of music metadata: another use case of LOD? 20m

        Music-related projects dealing with complex metadata have a very long tradition in musicology and have produced a great variety of project-specific data formats and structures. This, however, hinders interoperability between data corpora and, ultimately, the full exploitation of the unprecedented potential of cutting-edge computer science. In this context, the schema defined within the Music Encoding Initiative (MEI) represents a significant step towards standardized music metadata. The MEI schema is the product of a collaborative, community-driven effort and is the de facto standard on international level for encoding music-related data (and notes) in a machine-readable structure.

        The Metadata Editor and Repository for MEI Data (MerMEId, https://github.com/Edirom/MerMEId) is the only software tool for creating and editing metadata files in MEI. The editor was originally developed (starting from 2009) and widely adopted in the context of catalogs of works of music composers (e. g. Catalogue of Carl Nielsen's Works, https://www.kb.dk/dcm/cnw/navigation.xq), with the musical “work” being the central starting point of the encoding process. The current tool layout reflects this work-centered approach, guiding the user through a series of forms to capture various types of information related to a specific musical work, like music notes, performance history, sources, bibliography and more. The output is an XML file that conforms to the MEI schema, albeit with limitations due to lack of flexibility offered by the tool.

        The musicological community needs a more flexible way of capturing metadata that goes beyond the work-centric approach described above. In addition, MerMEId requires codebase modernization to overcome the technical debt accumulated over the years and to become more sustainable and adaptable to new technologies. Against this background, the MerMEId community and the Centre for Digital Music Documentation (CDMD) of the Academy of Sciences and Literature | Mainz are working to further develop MerMEId. Here we present ideas and considerations on which functionalities the new MerMEId should have and how these could be technically implemented. A key improvement will be a modularized approach based on LOD. The user will be able to encode a wider range of freestanding entities, in addition to musical works, such as e. g. sources, persons, places, events, and bibliographic items. It will be possible to link the entities with each other through defined relationships and to create different types of catalogs according to project-specific requirements. We plan to achieve this representing our data as a Resource Description Framework (RDF) graph. Each freestanding entity will be modeled according to the MerMEId ontology, and the corresponding user interface for editing it will be described using the Shapes Constraint Language (SHACL). MerMEId will be able to import data from external triple stores and provide the possibility to enrich them. Albeit not strictly necessary for storing the data, it will be possible to export data as XML files according to the MEI or TEI schema. Future developments will also lower the technical barrier for setting up a project-specific online MerMEId instance, in particular for users or institutions with lower technical background. These are fundamental steps towards state-of-the-art Digital Musicology.

        Speaker: Carlo Licciulli (Akademie der Wissenschaften und der Literatur | Mainz)
      • 19:40
        CFF2Pages: Expanding Workflows with Markdown Export 20m

        cff2pages is a tool that generates HTML files from metadata collected in the Citation File Format (CFF). It can be used to create a static page on GitHub or GitLab Pages to showcase a project. This is particularly useful for small research software projects, offering an easy-to-use workflow that converts machine-readable metadata into human-readable formats for several purposes:
        Enhancing Metadata Quality: Displaying metadata in a user-friendly format can encourage contributors to improve metadata quality by allowing them to verify how it appears to others.
        Creating a Project Landing Page: This provides a welcoming and accessible entry point for those not directly involved in development, addressing the challenge of navigating a repository.
        Centralizing Metadata: Having one place for all metadata reduces the risk of discrepancies across different outputs when multiple pages are generated.
        As highlighted at deRSE24 (https://zenodo.org/records/7767509), cff2pages requires further development to broaden its appeal for various use cases. For example, The Carpentries are interested in their workbench for their Open Source training marterial. To support such use cases, we have begun developing an export option to Markdown files. This new workflow and future development plans will be presented in this poster session.

        Speaker: Jan Bernoth (Universität Potsdam)
      • 19:40
        Collaboration without data sharing: the Federated Secure Computing architecture 20m

        In domains with relevant security or privacy concerns, open data sharing among cooperation partners is often not an option. Here, cryptography offers alternative solutions to reconcile cooperation and data protection. Participants engage in peer-to-peer computation on encrypted data, arriving jointly at the intended result, without ever having access to each other’s input data. While elegant in theory, this approach has its own challenges in terms of complexity, DevSecOps and cloud federation.

        Federated Secure Computing is a free and open source project hosted by LMU Munich and financed by Stifterverband. The middleware between client-side business logic and server-side cryptography backend is designed to let research software engineering practitioners use secure computing with ease. It lets students write simple secure logic with as little as ten lines of Python code and can be run on IoT hardware such as Raspberry Zeros.

        The Federated Secure Computing project offers real-world use cases and learnings in terms of state-of-the-art administrative and technical data protection measures.

        Speaker: Christian Johannes Goelz
      • 19:40
        DataLad: 10+ years of academic software development 20m

        DataLad (Halchenko et al., 2021 [1]) is free and open source software for managing digital objects and their relationship built on top of Git and git-annex. Its initial commit in 2013 marked the beginning of a more than 10 year long academic software history so far, supported by various grants, institutions, and underlying research endeavors. Over time, the software became an extendable ecosystem, addressing a broad range of data logistics challenges in a core library and many extension packages, growing both in features and contributor community. In turn, it also sparked development and grant support in git-annex, a crucial software with a bus factor of 1. Navigating the research software waters of changing affiliations, developer churn, research obligations, and a modular architecture that offers flexibility, but also bears a potential for complexity and fragility, has never been easy.
        In this contribution, we want to give a case-study-like overview of the lifetime of this research software so far, reflect on the design and development decisions we have made over the years and their advantages or shortcomings, share lessons learned, and give an outlook into the future of the software ecosystem.

        [1] https://joss.theoj.org/papers/10.21105/joss.03262

        Speaker: Michael Hanke
      • 19:40
        Deploying Infrastructure-as-a-Service at GSI 20m

        With the ever-increasing data sizes employed at large experiments and their associated computing needs, many applications can benefit from access to dedicated cluster resources, in particular server-grade GPUs for machine learning applications. However, computing clusters are more often tailored to batch job submission and not to online data visualisation. Infrastructure-as-a-Service (IaaS) applications offer a route for users to access these resources using graphical applications through their web browser. Reliability and simplicity of use are key for these to be used effectively. At the same time, the security of the cluster resources must be maintained and so these must be configured in such a way that resources are not exposed to unauthorised users over the internet.

        In this contribution I will discuss the planned deployment of IaaS applications at GSI, including a centrally managed JupyterHub instance for launching Python notebooks and a noVNC system for launching desktop applications on the computing cluster.

        Speaker: Jeremy Wilkinson (GSI Helmholtzzentrum für Schwerionenforschung GmbH, Darmstadt)
      • 19:40
        Designing Intuitive and Flexible Software for a Novel Image-Based Cell Sorting Method 20m

        Image-based cell sorting is a key technology in molecular and cellular biology, as well as in medicine, enabling the isolation of desired cells based on spatial and temporal information extracted from live microscopy. Beyond the extensive application of sorting methods in the fields of immunology and oncology, growing interest from other disciplines like personalized medicine underscore the need for user-friendly and versatile sorting platforms. Here, we present a novel, automated image-based sorting method that leverages a microscope to selectively target and isolate cells. In this method, cells are resuspended in photoresist, and undesired cells autonomously identified and encased in hardened structures via selective photopolymerization, allowing for subsequent filtration. The experimental set-up is adaptable to microscopes commonly found in life science laboratories, which eliminates the need for expensive sorting equipment or extensive user training.

        Our method includes custom-designed software that manages microscope communication and image analysis for high-throughput sorting decisions. The software is currently being developed as a Python package and designed for flexibility, aiming for an easy adaptation for specific research needs and use cases. The software provides an interface to control the microscope through an API and for configuring sorting procedures – such as selecting scan patterns, magnification levels, and cell classifiers. It further includes an adaptation protocol for researchers to develop or integrate their own image analysis pipeline for classification. An important consideration in the software design is the efficient scheduling of hardware configuration changes in coordination with fast and reliable image analysis to ensure a high accuracy of the experimental outcome. To ensure accessibility, we are also developing an intuitive graphical user interface to allow users with no programming experience to effortlessly set up a sorting process and fine tune analysis parameters.

        By providing intuitive and flexible software for our platform, we hope to present a versatile, accessible and cost-effective sorting solution for researchers across disciplines.

        Speaker: Stefan Josef Maurer
      • 19:40
        Digital Edition of the Levezow Album: Interactive Visualization of 17th-Century Drawings 20m

        The "Digital Edition Levezow Album" project is an interdisciplinary collaboration between the Hub of Computing and Data Science (HCDS), the Department of Art History at the University of Hamburg, and the State and University Library Hamburg. The project aims to digitally process and interactively visualize a previously unexplored sketchbook from the late 17th century, containing drawings on anatomy, antiquity, proportion studies, and natural history.

        By leveraging modern technologies such as digital editing techniques and advanced image processing, the Levezow Album is made accessible to a broad audience. Each page of the album is accompanied by detailed explanations authored by students of the Department of Art History. These texts provide context regarding the significance, origins, and intricacies of the drawings. Additionally, an interactive commenting feature allows users to suggest alternative sources and engage in a dialogue about the artworks.

        This project demonstrates how digital methods can be used in the humanities to reinterpret and make historical artifacts accessible. It serves as an example of the successful integration of research, education, and digital technology to promote cultural heritage.

        Speaker: Amy Chaya Isard (Universität Hamburg)
      • 19:40
        Eine weitere Tool Registry für die DH?! Aber diesmal offen und community-zentriert auf Wikidata 20m

        Der Zusammenhang zwischen Methoden, diese implementierenden Werkzeugen (Software) und ihrer Nützlichkeit für die Untersuchung einer Forschungsfrage und -gegenstands ist von immanentem Interesse für die computationell arbeitenden Geisteswissenschaften. In der Folge hat sich das Toolverzeichnis in den Digital Humanities inzwischen als eigenes Genre etabliert: von TAPoR (3.0)^1 (Grant u. a. 2020), großen EU-Projekten wie dem Social Sciences and Humanities Open Marketplace,^2 oder den Konsortien der Nationalen Forschungsdateninfrastruktur (NFDI).

        Alle uns bekannte Ansätze
        - verfügen über keine dauerhafte Finanzierung;
        - setzen primär auf die Kuratierung durch (unbezahlte) Expert_innen-Gremien oder Crowdsourcing, können diesen Prozess aber nicht dauerhaft und nachhaltig gewährleisten;
        - schaffen Datensilos mit proprietären Infrastrukturen (Datenmodelle, Backends und Frontends);
        - und bieten nur in geringem Maße dokumentierte APIs an.

        Der Anspruch eines repräsentativen oder gar umfassenden Abbilds aktuell verfügbarer Werkzeuge für die computergestützte Forschung ist daher nicht einzulösen und muss, wo er formuliert wurde, als gescheitert gelten. (vgl. Dombrowski 2021).

        Da der Informationsbedarf aber weiterhin besteht, stellt dieser Beitrag unseren Ansatz einer offenen Basisinfrastruktur für Toolverzeichnisse vor. Wikidata^3 steht dabei als ein verteilter, community-kuratierter Wissensgraph und offene Softwareplattform im Zentrum unseres Ansatzes und adressiert die Schwächen anderer Ansätze. Wikidata erlaubt es, Datenmodelle iterativ zu entwickeln, Datensätze zu pflegen und sie in Wikiprojekten zu kuratierten Sammlungen zusammenzustellen. Auf der Datenebene erlaubt Wikidata die unmittelbare Nutzung sämtlicher Informationen als Linked Open Data über SPARQL, APIs sowie das etablierte Webinterface. Datensätze auf Wikidata sind im globalen Internet sehr sichtbar und bilden beispielsweise eine der Quellen für zusammenfassende Informationen in den Ergebnislisten von Suchmaschinen. Darüber hinaus bietet Wikidata eine etablierte Governancestruktur für nutzergenerierte und -kuratierte Inhalte. Jede_r kann die Einträge beitragen und pflegen, die für ihre je konkrete Forschung relevant sind (Stakeholderprinzip).

        Die Grundlage verteilt angelegter Datensätze ist ein reduziertes Basisdatenmodell für DH-Werkzeuge, das minimale bibliografische und technische Eigenschaften definiert. Damit ist es anschlussfähig für existierende Datenmodelle, wie dem des Software Preservation Network oder von RIDE (Christopherson u. a. 2022; Sichani und Spadini 2022) und kann zu einem gemeinsamen Referenzmodell beitragen. Dieses Basisdatenmodell garantiert den Zugriff auf die verbindlich vereinbarten Basisdaten für die Nachnutzung von Einträgen in eigenen kuratierten Sammlungen mit erweiterten Datenmodellen. Für den Kontext der DH haben wir dieses Basismodell etwa um eine Klassifizierung unter Anwendung der TaDiRAH-Taxonomie4 (Borek u. a. 2021) und die Hinterlegung von Anwendungsbeispielen, Publikationen oder Tutorials im angereicherten Datensatz erweitert.

        Als Plattform erlaubt Wikidata die Umsetzung des Ansatzes ohne weitere Software. Denkbar ist aber auch, Wikidata ausschließlich als Normdatei und Datenprovider für eigene Frontends einzusetzen, so wie es z.B. Scholia5 (Nielsen, Mietchen, und Willighagen 2017) tut.

        Schließlich adressiert unser Ansatz die Nachhaltigkeit von Projektförderungen durch den kontinuierlichen Beitrag von Daten zu den Digital Commons (Wittel 2013) in Gestalt von Wikidata während der Projektlaufzeit und die Weiternutzung dieser Daten nach der Projektlaufzeit. Damit ist unser Vorschlag Teil einer Bewegung, Wikidata in der Wissenschaft und GLAM-Institutionen zu verankern (vgl. Zhao 2022; Fischer und Ohlig 2019).

        Speakers: Till Grallert (Humboldt-Universität zu Berlin), Nicole Elisabeth Hitomi Dresselhaus (Humboldt-Universität zu Berlin)
      • 19:40
        Empirical Analysis of Software Quality Assurance Practices in Research Software 20m

        As scientific research increasingly relies on software to handle complex data, limited formal training in software development among researchers often leads to issues with documentation, code reliability, and reproducibility. In this study, we conducted an empirical analysis of 5,300 open-source research repositories, focusing on practices aligned with FAIR4RS recommendations. Python was the most common language (36%), followed by R (17%) and C++ (8%).

        Our findings reveal that around 75% of repositories included a license, 20% were registered in community registries, and 25% offered citation information, reflecting initial adoption of FAIR recommendations. Basic documentation was inconsistent, with installation instructions included in 47% of repositories and usage guides in 45%. Software quality practices varied: only 0.3% used a software quality checklist badge, test folders were found in 35%, and 41% implemented continuous integration (CI). Additionally, for Python, R, and C++ repositories, 69% explicitly defined dependencies through configuration files.

        In further analysis, we will examine adherence to community-specific guidelines for test case organization, explicit dependency requirements, and automated tasks within CI tools, as well as assess the influence of contributor count on these practices. This poster invites the research software community to discuss the benefits and challenges of adopting FAIR4RS recommendations and quality assurance practices to enhance software quality and reproducibility.

        Speaker: Akshay Devkate (Universität Potsdam, Institut für Informatik und Computational Science)
      • 19:40
        Enhancing RSE skills: a gamified approach 20m

        Students, postdocs, and other researchers continuously seek to develop beneficial skills for their work.
        One traditional way to up-skill is through workshops, but scheduling conflicts and varied learning styles can be barriers to effective learning. To address these challenges, we propose a learning framework that leverages GitHub’s capabilities. The idea follows from a digital version of a “scavenger hunt” game, offering self-paced learning and a token/badge collection system for tracking progress. Our plan is to build upon existing open educational resources, for example provided by the Carpentries and CodeRefinery, to guide students through different themes and exercises. Our proposed framework keeps track of exercises via GitHub issues and grades the exercises automatically with GitHub actions. Users are allowed to skip exercises and focus on the ones they feel the need to tackle, while getting immediate feedback. This approach can be extended to various modules, including general topics such as version control, testing, and CI, as well as more specific domains like basic numerical methods and domain science subjects. This framework aims to foster skill development, and promote good practices in the use of version control platforms. In this demo, I would like to show the current status of the project, provide first examples of workflows and usage, and outline future plans. I am particularly interested in collecting feedback from the community at this early stage and potential testers.

        Speaker: Maria Guadalupe Barrios Sazo (Forschungszentrum Juelich)
      • 19:40
        Ensuring Reproducibility in OntoClue – Vector-Based Document Similarity for Biomedical Literature – Using Docker 20m

        In the realm of biomedical research, the ability to accurately assess document-to-document similarity is crucial for efficiently navigating vast amounts of literature. OntoClue is a comprehensive framework designed to evaluate and implement a variety of vector-based approaches to enhance document-to-document recommendations based on similarity, using the RELISH corpus as reference. RELISH is an expert curated biomedical literature database comprising PubMed IDs (PMIDs) and document-to-document relevance assessments categorized as "relevant," "partial," or "irrelevant." The dataset includes titles and abstracts of associated articles, which are preprocessed to remove stop words and structural words, convert text to lowercase, and tokenize the content.
        OntoClue integrates various natural language processing (NLP) models, including Word2Vec, Doc2Vec, fastText, and state-of-the-art BERT-based models like SciBERT, BioBERT, and SPECTER, as well as in-house hybrid approaches that leverage annotated text through Named Entity Recognition (NER) to incorporate semantic understanding into plain text. The framework assesses document similarity using evaluation metrics that consider relevance judgment, search efficiency, and re-ranking in the context of biomedical research.
        Recognizing the complexity of managing various repositories for each vector-based approach and their dependencies, OntoClue employs Docker containerization to mitigate potential conflicts and ensure seamless execution across platforms. Our methodology involves splitting the dataset into training, validation, and test sets. The training set facilitates the model training, while the validation set employs Optuna for hyperparameter optimization, using Precision@5 as the objective function. The test set is used for final evaluation, using metrics like precision@N and nDCG@N to ensure relevance and efficiency in document retrieval.
        Despite rigorous testing—such as setting the random seed for model training to ensure consistent initialization, using a single worker to manage parallel processing, and configuring Optuna to run with a single job for stability—we encountered occasional inconsistencies in our results. To address this, we used Docker to standardize the Python environment and set the Python Hash seed (in the Dockerfile), which not only enhances reproducibility but also ensures that any user can replicate the results without being affected by local environmental discrepancies.
        Docker-based containerization is integral to OntoClue, ensuring that code dependencies, datasets, and execution environments are fully portable and reproducible. This approach not only simplifies model training but also guarantees version control and resolves dependency conflicts, thereby enhancing ease of use and consistency in performance. Furthermore, we conduct reproducibility tests to compare results from identical runs using the same embedding models and hyperparameters. These tests require consistent hyperparameter configurations in the same order across multiple runs for the same number of iterations, and demand that Precision@N values match exactly to four decimal points. When these conditions are met, we confirm the reliability of the pipeline, reinforcing the integrity of our research.
        The OntoClue Docker features a user-friendly command-line interface that allows researchers to select from 18 different embedding approaches. Upon selection, the Docker container automates essential processes, including repository cloning, dataset downloading, and class distribution selection for training. Additionally, the framework includes options for dataset integrity checks and model reproducibility tests, ensuring that the pipeline delivers consistent, reliable results.

        Speaker: Rohitha Ravinder (ZB MED - Information Centre for Life Sciences, Cologne, Germany)
      • 19:40
        EVERSE: European Virtual Institute for Research Software Excellence 20m

        The European Virtual Institute for Research Software Excellence (EVERSE) is an EC-funded project that aims to establish a framework for research software excellence. The project brings together a consortium of European research institutions, universities, and infrastructure providers to collaboratively design and champion good practices for high-quality, sustainable research software. You can learn more about the project at https://everse.software/.

        The primary objective of EVERSE is to create a community-driven framework that empowers researchers and developers to produce, maintain, and share high-quality research software. The project will achieve this by consolidating existing good practices, developing the Research Software Quality Kit (RSQkit), integrating tools and services for software quality assessment, and engaging with the research community through pilots, training, and recognition initiatives.

        EVERSE will establish a European network for research software quality, fostering collaboration among researchers, developers, and infrastructure providers. The project will actively engage with five European science clusters, ensuring that the developed framework is tailored to the specific needs of each scientific domain. These are: ENVRI-FAIR (environmental research), EOSC-Life (life sciences), ESCAPE (astronomy and particle physics), PaNOSC (photon and neutron sciences), and SSHOC (social sciences and humanities).

        By championing research software quality and providing a supportive ecosystem for sustainable software development, EVERSE will contribute to the advancement of open science and reproducible research across various scientific domains. The project's outcomes will benefit researchers, research software engineers, and the broader scientific community, ultimately enhancing the quality, reliability, and impact of research software in Europe and beyond.

        Speaker: Carlos Martinez (Netherlands eScience Center)
      • 19:40
        Exploring the TIDO Viewer: A Generic, Interactive, and Research-Driven Solution for Digital Texts and Objects 20m

        In this demo, we present the TIDO Viewer, a flexible application developed by SUB Göttingen, specifically designed for the interactive presentation of digital texts and objects. In combination with the TextAPI, the TIDO Viewer enables the dynamic integration and visualization of digitized content. This synergy supports various use cases in research and library environments, offering modular, customizable display and interaction options. TIDO is already successfully applied in several digital editions, providing researchers and educators with deeper insights into historical texts and other digitized documents.
        The TIDO Viewer was developed according to the principles of Research Software Engineering (RSE), with the aim of creating a sustainable and easily maintainable solution for digital editions. By loosely coupling technology components through an API-driven architecture (notably the TextAPI), the system ensures flexibility and scalability while adhering to the principles of long-term usability and continued development.
        This demo offers a live exploration of the TIDO Viewer, showcasing its configuration and application possibilities while discussing its potential added value for research, teaching, and library infrastructures.
        Participants will experience a fully integrated workflow, where scientific material is seamlessly delivered to a web platform. Specific presentation scenarios can be explored on the front-end, or existing configurations can be demonstrated.
        Discover how the components of your stack work together: a web server delivers TextAPI resources from your scientific material, while a client application transforms those resources into interactive, user-friendly interface layers.

        Speakers: Mr Orlin Malkja (Niedersächsische Staats- und Universitätsbibliothek Göttingen), Mr Paul Pestov (Niedersächsische Staats- und Universitätsbibliothek Göttingen)
      • 19:40
        FACILE-RS: Automated Metadata Conversion and Software Publication Based on CodeMeta 20m

        Research software development is a fundamental aspect of academic research, and it has now been acknowledged that the FAIR (Findable, Accessible, Interoperable, Reusable) principles, historically established to improve the reusability of research data, should also be applied to research software. However, specific aspects of Research Software like executability or evolution over time require these guidelines to be adapted, and the FAIR principles for Research Software (FAIR4RS) have been introduced in 2021.

        An important aspect of FAIR research software the ability to find and retrieve software and its metadata through standardized protocols, both by machines and humans. In this context, several metadata standards are used across the scientific community:
        - The Citation File Format (CFF) is a human- and machine-readable format that indicates how to cite software.
        - The DataCite Metadata Schema is one of the established standards for archiving.
        - The CodeMeta standard is specifically tailored to research software and aims to standardize the excahnge of software metadata across repositories and organizations.

        All of these standards serve specific purposes, and several are required to cover the whole software lifecycle. However, maintaining multiple metadata files in different formats can be a significant burden for research software developers and an obstacle to the adoption of good software publication practices. In addition, as the content of the different metadata files is largely overlapping, maintaining these files manually can pose a risk to data consistency.
        Another requirement for FAIR software is that every software release is published and assigned a persistent identifier. This can be tedious and prone to errors without an automated process.

        To address these challenges, we have developed the Python package FACILE-RS (Findability and Accessibility through Continuous Integration with Less Effort for Research Software), which facilitates the archival and long-term preservation of research software repositories.
        On the one hand, FACILE-RS simplifies the maintenance of software metadata by offering tools to generate metadata files in various formats, based on a single CodeMeta metadata file that is maintained manually. On the other hand, FACILE-RS provides scripts which automate the creation of software releases on GitLab, as well as on the persistent research data repositories Zenodo and RADAR.
        FACILE-RS also provides a set of GitLab CI/CD (Continuous Integration/Continuous Delivery) pipelines to automate the processes of metadata conversion and software publication.

        We believe the automated metadata conversion based on CodeMeta and the automated software release pipelines can help research software developers to make their publication workflows more efficient and can facilitate the adoption of good software publication practices by reducing the effort required to make research software FAIR.

        Speaker: Marie Houillon (Karlsruhe Institute of Technology)
      • 19:40
        From Idea to Prototype: Using BITS and LLMs to automate the annotation process for SGN Collection Data 20m

        Being cross-disciplinary at its core, research in Earth System Science comprises divergent domains such as Climate, Marine, Atmospheric Sciences and Geology. Within the various disciplines, distinct methods and terms for indexing, cataloguing, describing and finding scientific data have been developed, resulting in a large amount of controlled Vocabularies, Taxonomies and Thesauri. However, given the semantic heterogeneity across scientific domains (even within the Earth System Sciences), effective utilisation and (re)use of data is impeded while the importance of enhanced and improved interoperability across research areas will increase even further. The BITS Project (BluePrints for the Integration of Terminology Services in Earth System Sciences) aims to address the inadequate implementation of encoding semantics by establishing a Terminology Service that may serve the whole ESS Community on national, european and international level. It will be developed based on the existing TS of the TIB, supplemented by an ESS Collection that already contains relevant terminologies for Earth and Environmental Sciences and to which further relevant terminologies will be added. The implementation of this TS within two data repositories (WDCC at the German Climate Computing Center and a Data Collection at Senckenberg) will showcase the benefits for such different data regarding e.g. enhanced and improved discoverability of research products or automated metadata annotation.

        We will present a workflow at SGN that combines the usage of BITS outcome (i.e. ESS collection of the TIB TS) with GPT4all in order to identify gaps in terminologies on the one hand, and provide assistance to scientists, who are working on new collections on the other hand. Based on two major data management challenges facing SGN, Legacy Data Digitisation (historical grown data require systematic transformation into machine-readable formats) and Data Proliferation Management (continuous input of data generated by ongoing collection efforts and research activities), our prototyping process can be divided into several areas:
        - Identifying nominal phrases (NPs) in the collection data and annotating them using BITS TS. Our primary goal was to achieve reliable detection, with a focus on minimising false negatives, while accepting some false positives during annotation.
        - During the prototyping phase, several obstacles were encountered referring to poor NP detection quality in scientific texts and a lack of reliability in conjunction splitting and singularization using common tools. It is also not always possible to determine the correct language of the text, especially with mixed-language content.
        - Revising our requirements had let us choose GPT4all as our preferred solution, specifically the Meta-Llama-3-8B-Instruct.Q4_0.gguf model.
        - This allows us to perform high quality NP detection and transformation, but with very high computational and time requirements. To optimise resource utilisation, GPT4all is employed only for high-level operations. Other operations can be performed by tools with less hardware requirements.
        - Using statistical logging allows us to identify various significant information about the NP detection and usage. This data we can reuse in later development steps.

        By leveraging the strengths of BITS and GPT4all, SGN is paving the way for more accurate processing of complex scientific data to improve research outcomes.

        Speaker: Alexander Wolodkin (Senckenberg – Leibniz Institution for Biodiversity and Earth System Research)
      • 19:40
        From Start to Finish- The Ideal Process of Using Simulation Software in Energy Research Projects 20m

        There are many methods for conducting research in literature. The research and transfer cycle within energy system research projects by Stephan Ferenz describes how to carry out a FAIR research project in six steps. However, these steps are very general and do not focus on research software. In energy research, simulation software is especially a vital research artifact. Therefore, we are developing services to support researchers using simulations in their research within the National Research Data Infrastructure for Interdisciplinary Energy System Research (NFDI4Energy).
        To better analyze the needs of researchers, we have collected multiple use cases that describe the different steps of energy system research with simulation software. Subsequently, we organized these use cases and designed a process for using simulation software in energy research. The process begins with providing teaching materials on simulation in energy system research. It ends with providing the simulation scenarios and the research results in a suitable data repository and a software registry. The process links the use cases relatively simply, but researchers with different levels of prior knowledge can also apply it. To this end, we have defined various entry points so that experienced researchers can skip the first steps and start directly with creating a simulation scenario, for example. The aim of the process is to support the research process and share the results of the simulations FAIRly in the end. In research, not only the data generated in a simulation is interesting, but also the developed models and software, so we encourage them to share these in a software registry. Our aim within NFDI4Energy is to develop services and tools from scratch and offer existing services by linking them to our platform.
        We would like to present our process for using simulation software in energy research as a poster to get some feedback from the RSE community. This overview fits perfectly with the scope of the deRSE conference 2025.

        Speaker: Corinna Seiwerth (Friedrich-Alexander-Universität Erlangen-Nürnberg)
      • 19:40
        HPC and AI in Germany: Resources and Support 20m

        Researchers from a broad spectrum of scientific fields use computers to aid their research, often starting at their own laptop or institutional workstation. At some point in the research, additional help in form of algorithmic or software engineering consultancy or even additional computational resources in form of access to high-performance computing (HPC) systems may become necessary. Furthermore, on the side of AI, it may be unclear how practices of machine learning and artificial intelligence could be employed for their specific research.

        This poster aims at enabling researchers to identify resources for computing time, application support and training on topics of HPC and AI, by presenting a collection of contacts and further information on initiatives in Germany to aid them in their work. It ties together resources, both on the state level, with information on HPC competence networks, as well as the national level, with information on the NHR initiative (Nationales Hochleistungs-Rechnen), the National AI Service Centers, as well as national resources of the Helmholtz Association. As such, it enables participants at the conference, as well as anyone viewing the poster later, to identify quickly contacts close to their institution or domain to receive assistance with HPC and AI services.

        Speaker: Marc-Andre Hermanns (RWTH Aachen University)
      • 19:40
        Impact of research software engineering by natESM in climate and weather domain 20m

        Earth System Modeling (ESM) involves a high variety and complexity of processes to be simulated which resulted in the development of numerous models, each aiming on the simulation of different aspects of the system. These components are written in various languages, using different High-Performance Computing (HPC) techniques, tools, and overlap or lack functionalities.

        To use the national HPC resources and the scientific expertise more efficiently, the national Earth System Modeling strategy (natESM) project aims to establish a coupled seamless ESM system by providing so-called technical-support sprints. A sprint consists of a goal-oriented package of work executed by a dedicated RSE on a selected ESM model during a defined amount of time. Here we present the results achieved during the project so far in terms of technical improvements to the community code and the community perception on the project.

        Since April 2022, 15 sprints have been conducted by the project team, working on different subjects like GPU porting, coupling, parallelization and general software engineering tasks. By far, the largest interest of the community has been in GPU porting, which was the focus of 8 of the 15 sprints, followed by coupling and model integration with 5 sprints. This is in line with the natESM vision, and reinforces both the power of such a project in shaping the community codes and the importance of a clear strategy and communication.

        These sprints focused on 13 models from the community, including ocean modelling, atmospheric chemistry, land and urban surface, frameworks and more. Out of these 13 models, 10 are written mostly in Fortran – indicating still a preference from the community for this language – 2 are written in C/C++ and 1 in Python.

        naESM’s positive perceived impact on the ESM community resulted in the preparation of a second phase for the project. An objective survey is planned before February 2025, to get the feedback from the scientists who have engaged in natESM sprints. The survey results will be included in the poster for the entire RSE community to witness. The community workshop, which is planned for February 2025, will further serve as a platform to get the pulse of the community regarding the services provided by natESM.

        We aim to show the impact that a project such as natESM can have on the scientific community it pertains to. It is an effective way to help scientists overcome technical challenges, ultimately enabling the models to support more and better science. Due to its governance structure, it can also act as the executive entity responsible for bringing to reality a vision shared by the community. We believe this is a model that can be replicated by other institutions and for different fields to provide technical support for a broad group of scientists.

        Speakers: Aleksandar Mitic (DKRZ), Aparna Devulapalli (DKRZ)
      • 19:40
        Inbound licensing with ease 20m

        Managing projects with external collaborators sometimes comes with the burden of ensuring inbound contributions respect legal obligations. Where a low-level 'Developer Certificate of Origin (DCO)' approach only introduces certain checks, a 'Contributor License Agreements (CLAs)', on the other hand, relies on documenting signed CLAs and thus dedicated book-keeping.
        In this poster, we showcase our initial approach to a 'CLA Bot' that checks merge requests on compliance with either a DCO or CLA. While this is work in progress, our goal is to provide a similar functionality already available on Github also for community instances of Gitlab. We show the interaction between the bot and users, its limitations, and list steps taken for the automation via CI pipelines. Here, the somewhat simpler approach to pipelines in Gitlab vs Github necessitates working with webhooks that act on events within Gitlab.
        Our setup does not rely on a central server (in fact any additional server) and can be used by individual projects without having to share data. By using webhooks and CI pipelines, our approach can be used for similar automation tasks, offering the potential to interact with users.

        Speaker: Dirk Brömmel (JSC, Forschungszentrum Jülich GmbH)
      • 19:40
        JuRSE: a RSE community of practice at FZJ 20m

        The poster will show what actions we’ve taken to create and engage an RSE Community at FZJ so that other centres might be encouraged to do the same thing at their centre. It will show the initiatives and tools that we have created like a publication monitor, Code of the Month, Open Hours, Newsletters etc. We will show how we’re encouraging good practice through our ‘Resources’ website which helps scientists adopt the FZJ software guidelines and how we’re spreading the word about what good scientific practice means when doing research software engineering.

        Speaker: Claire Wyatt (JSC/FZJ)
      • 19:40
        Keeping it REAL 20m

        The relevance of Open Science and Open Data is becoming increasingly obvious in modern day publications. Frequently, scientists write their own analysis code, as the complexity of analysis increases and the combination of methods become more relevant – from code conversion, to measuring and comparing. These functions and methods are not stable, are subject to change, are constrained to the use case and data used etc.

        Opening and maintaining data already poses a substantial number of issues, including versioning, provenance, format, and generator specific constraints, such as precision, resolution etc. These problems intensify as we regard code that generates such data. In this presentation we will talk about the problems associated with maintaining research code.

        As such, for example, code executability is even more difficult to maintain as data readability. This is mostly due to the strong dependencies of code to libraries, drivers, hardware and operating systems. All of these dependencies are subject to frequent changes, which may cause the code to not execute properly anymore or – in the worst case – still execute but deliver different results. Maintaining the code is effort-intensive and therefore basically impossible in the context of a research publication. Obviously, if the code is adapted, we need to maintain all previous versions and refer to the right versions used in the publication to ensure that in principle the same results can be reproduced should a divergence arise as a consequence of maintenance.

        Due to the specificity of the code, i.e. it being originally developed for a very specific use case and data format, it requires even more effort to adapt that given code to another context, if e.g. the data format or resolution changes, even if the type of analysis and the research question remains the same. Where possible, therefore, the algorithm behind the analysis code may be of more importance than the code itself, given that it is numerically correct. As an implication, implementations may diverge from the numerical results due to the platform accuracy – ideally only minimally, however. As noted, though, this is not appropriate for too complex code where the algorithm would be too difficult to represent and explain, or for AI based and related methods that depend additional data, aka learning context. Implicitly, such methods would have to be treated differently.

        With respect to ensuring that data is not only FAIR, but also reproducible under any circumstances, we follow the suggestion that code must be treated in the same fashion, by making sure that all algorithmic processes published are
        - Reproducible in the sense that the results can be achieved again with the same process and context
        - Executable at any point in time (though not necessarily on any machine)
        - Attributable to the data and author at the stage of publication and
        - Literal in so far as that the algorithm is a sound and correct representation of the mathematical methods to be applied.

        Speakers: Lutz Krister Schubert (University of Cologne), Florian Thiery (Research Squirrel Engineers Network, CAA e.V.)
      • 19:40
        Knowledge graph embedding based missing metadata prediction 20m

        In scientific research, effective data management is crucial, especially when handling experimental data. The increasing volume and complexity of data collected in experimental settings necessitate rigorous methodologies to ensure that such data remains findable, accessible, interoperable, and reusable (FAIR). These requirements are seamlessly met by the RDF graphs, which is a type of knowledge graph. For example, Chair of Fluid Systems at TU Darmstadt developed a Metadata Database for the sensors based on a sensor information model. Physical properties like sensitivity, bias, measurement range, sensor actuation range, and other attributes such as identifier, manufacturer, and location are stored in RDF graphs.

        However, metadata records accompanying legacy data may be incomplete for various reasons, such as adherence to outdated standards, omission of essential parameters, redactions for confidentiality, and errors. Consequently, measure 5 of the NFDI4Ing Task Area “Alex” initiative focuses on reconstructing incomplete metadata to ensure its continued utility. This issue has been a topic of discussion and research in the biomedical field for several years. Numerous methods, including natural language processing techniques like Named Entity Recognition, are being explored to extract metadata from document abstracts or titles. However, challenges remain. For instance, metadata may be dispersed across multiple documents, making it difficult to locate, and some metadata may not be recorded at all. Moreover, the semantic relationships between different data samples are overlooked. In response to these challenges and the growing trend of using RDF graphs to store metadata, we are employing knowledge graph embedding methods to predict the missing metadata.

        Speaker: Longwei Cong
      • 19:40
        Lessons learned from building twins for partical accelerators: the value of architecture and patterns 20m

        Particle accelerators are complex machines consisting of hundred of devices. Control systems and commissioning g applications are used to steer, control and optimise them. Online models allow deriving characteristic parameters during operation.

        These online models need to combine components that use different views of the same physic quantity. Therefore appropriate support has to be provided to connect to the models. Similar tooling is required to connect to the real machine. Appropriate design of this glue facilitates constructing these twins.

        The authors report on their experience on available tools, architecture concepts and patterns used which simplify setting up and operating these twins in an accelerator world.

        Speaker: Waheedullah Sulaiman Khail (Helmholtz-Zentrum Berlin)
      • 19:40
        Linguistic corpus research software at the Leibniz-Institute for the German Language (IDS) 20m

        Research in linguistics is increasingly data-driven and requires access to language corpora, i.e. “collection[s] of linguistic data, either written texts or a transcription of recorded speech, which can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language” (Crystal 2003). Here, language itself is the object of study, and not just an obstacle on the way to retrieve information.
        Building large corpora for scientifically valid research is a labour-intensive process. This is true for written language, and even more so for spoken language, which not only needs to be converted into written form, but which also contains multiple, potentially overlapping speakers. Corpora also have to be enriched with relevant meta-information and linguistic annotations. What is more, the data that goes into a corpus can be subject to copyright (e.g. newspapers) or personal/privacy rights limitations (e.g. recorded and transcribed private conversation), which need to be sorted out before the corpus can be compiled and used. Thus, high-quality language corpora and the research software for querying, exploring, analysing and visualizing them are valuable assets for the linguistic research community. The Leibniz-Institute for the German Language (IDS) provides collections of both written and spoken language corpora and the specialized corpus research software for accessing them.
        Corpus-based linguistic research ranges from simple corpus search for retrieving instances of certain language phenomena to large-scale training of language models, whereby a concurrent reference to metadata (external) and content data (internal) is possible (Sinclair 1996). Other research tasks include the creation of statistics about the frequency of words or word combinations, or quantitative analyses of more complex linguistic structures. In order to be scientifically valid, the respective results need to be reproducible, ideally over a longer time.
        Belonging to the domain of humanities, linguistics has a higher share of practitioners with little technical literacy, which imposes limits on how difficult the use of the corpus research software should be. On the other hand, some advanced research questions simply require more powerful and thus more technically demanding methods, which many researchers, in particular from computational linguistics, actually have.
        At the IDS, corpus creation and software development and operation take place in projects with permanent funding, which ensures long-term availability. Access for registered users is only provided via web UIs or APIs, thus protecting the integrity of the data. Our aim is to provide as large a database as possible from which users can compile sub-corpora according to their research question by applying meta-data criteria. Using the exact same data basis, the software offers access via easy-to-use form-based query templates or graphical assistants, but also via specialized corpus query languages for advanced users. While the corpora are continuously expanded, changes between versions are tracked in the corpus meta-data, allowing to reproduce results from earlier versions as required.
        Our poster outlines how the IDS approaches the various conceptual, legal, linguistic, and technical challenges of research software for written and spoken corpora.

        Speaker: Dr Mark-Christoph Müller (Leibniz-Institut für deutsche Sprache)
      • 19:40
        Manual data review and quality control – An add-on to SaQC 20m

        The growing volume of high-resolution time series data in Earth system science requires the implementation of standardised and reproducible quality control workflows to ensure compliance with the FAIR data standards. Automated tools such as SaQC[1] address this need, but lack the capacity for manual data review and flagging. It is therefore the intention of this project to develop a Python-based tool with an intuitive graphical user interface (GUI) for local machines, thereby enhancing the functionality of SaQC. It is anticipated that the tool will be user-friendly, even for those with limited experience of Python. The GUI will therefore be capable of interactively visualising the time series data, highlighting the data that has already been automatically flagged. The selection of data points may be accomplished by clicking on them, and a flag may be assigned via a dropdown menu. An optional comment field may be utilised to record supplementary information, such as details of pollution events. Moreover, the option to unflag data that has failed the automated quality control process, but which is considered valid by the scientist, will be available.

        The manual flagging tool will be based on SaQC, thereby facilitating future integration. Consequently, integration into an existing SaQC workflow will be straightforward. It should be noted, however, that this is not exclusive to SaQC users; it can be easily applied to data created by another tool for automatic quality control. A simple conversion of the data via the pandas library will be sufficient for utilisation of the manual flagging tool. The flagging schemes can either be adopted from SaQC or own schemes can be integrated. Following the flagging process, the user is then able to decide how to export the data set.

        The manual flagging tool represents a valuable addition to existing toolkits for all scientists handling time-series datasets, effectively completing the data quality control process. From a scientific perspective, the benefits of this tool include increased efficiency and traceability in the data flow, as well as improved data quality through the fine-tuning of automatic controls based on experience and contextual knowledge.

        [1] Schäfer, David, Palm, Bert, Lünenschloß, Peter, Schmidt, Lennart, & Bumberger, Jan. (2023). System for automated Quality Control - SaQC (2.3.0). Zenodo. https://doi.org/10.5281/zenodo.5888547

        Speaker: Nicole Büttner (Institute of Meteorology and Climate Research – Atmospheric Aerosol Research, Karlsruhe Institute of Technology, Karlsruhe, Germany)
      • 19:40
        Modernizing Legacy Infrastructure Monitoring: Enhancing Performance with Prometheus and GitLab CI/CD 20m

        Effective monitoring of (computing) infrastructure, especially in complex systems with various dependencies, is crucial for ensuring high availability and early detection of performance issues. This poster demonstrates the integration of Prometheus and GitLab CI/CD to modernize our existing infrastructure monitoring methods. As infrastructure checks increase, our legacy monitoring system faces growing challenges such as performance bottlenecks, limited scalability, and maintenance difficulties. Prometheus, with its real-time monitoring and alerting capabilities, offers a scalable and flexible solution. It supports both horizontal and vertical scaling, efficient data storage, and a modular architecture that facilitates the seamless integration of various existing monitoring tools, such as specialized exporters.
        Using Prometheus as our backend involves setting up a containerized system, creating data sources and targets, and configuring (custom) metrics and alerts. The use of GitLab’s CI/CD pipeline further automates the building, deployment and testing processes. Additionally, Grafana, when used alongside Prometheus, provides a robust visualization tool to display statistics and reports, such as CPU and GPU usage or file quotas. This approach not only enhances efficiency and ensures timely alerts for potential issues but also keeps the monitoring system up-to-date and resilient. It also provides users with valuable statistics through a modern and flexible backend. Furthermore, containerizing the new monitoring system offers significant advantages, including portability, scalability, and modularization.
        The poster presents selected infrastructure systems, directly comparing the usability and performance of our legacy script-based monitoring system and the new Prometheus-based monitoring system.

        Speaker: Benjamin Bruns (FZJ)
      • 19:40
        Nebula, The birthplace of Open Science stars: NASA’s open science 101 curriculum 20m

        ABSTRACT

        We present a new cohort-based training program by OLS (formerly Open Life Science). OLS is a non-profit organisation dedicated to capacity building and diversifying leadership in research worldwide (https://we-are-ols.org/). Since 2020, we have trained 380+ participants across 50+ countries in Open Science practices, with the help of 300+ mentors and experts.

        The Nebula program, a collaboration between OLS and the National Aeronautics and Space Administration (NASA), is a six-week intensive course, covering topics including open data, open code, open access publishing, and collaborative research tools. The training is targeted at individuals or teams interested in integrating best practices for open and reproducible research in new or ongoing projects. It is designed to be informative regardless of prior experience with open science. In our first year, participants have joined from 28 countries, from interdisciplinary domains and diverse career backgrounds: undergraduates to senior scientists and policy makers.

        The sessions, designed for accessibility, are delivered in English with automatic live captions, and recordings with corrected captions are made available after the calls. Participants receive personalised feedback on their work from an expert.

        By participating in this training program, researchers gain:

        • A comprehensive understanding of open science principles and best
          practices.
        • The ability to identify and utilise open science tools and
          resources.
        • The skills necessary to effectively collaborate with other
          researchers in a transparent manner.

        Nebula has the potential to significantly promote the adoption of open science practices within the research community, by encouraging a culture of collaboration and transparency.

        Speaker: Deborah Udoh (OLS (formerly Open Life Science))
      • 19:40
        neuro-conda: A Python Distribution For Neuroscience 20m

        Neuroscience is a multi-disciplinary field that involves scientists from diverse backgrounds such as biology, computer science, engineering, and medicine. These scientists work together to understand how the brain operates in health and disease. The areas of application in neuroscience that require software are as diverse as the scientific backgrounds and programming skills of the scientists, ranging from experimental control and data collection to simulations, data analysis, and management. Python has established itself as the de-facto standard in modern neuroscience due to its accessibility and broad scope of applicability.

        However, the software tooling supporting Python workflows has to be handled by often inexperienced end-users leveraging well-established scientific libraries shipped across dozens, sometimes hundreds of dependent packages. Setting these up in a robust and reproducible manner is crucial for the quality of the research but oftentimes not trivial to accomplish. For example, the dependencies of one package may be incompatible with another resulting in a conflict that has to be resolved manually. Python’s lack of a standardized package manager spurred the emergence of several third-party solutions, such as pip, conda, and poetry, making this task even more complex.

        To ease the initial burden of dependency management, we built the Python distribution neuro-conda as an accessible entry point into the existing universe of software tools for neuroscience. It provides an easy-to-install, ready-to-use computational working environment for neuroscience supporting all major desktop operating systems (Windows, macOS, and Linux). Installation from scratch can be done with a single one-liner from the command line. Adding neuro-conda to existing conda installations is also possible. Through curation of the included packages and providing explanatory package lists, neuro-conda simplifies the setup process and ensures reproducibility of the research. It is available from https://github.com/neuro-conda.

        We provide bi-annual releases that bring new feature updates of included libraries to end-users while previous releases remain available. The neuro-conda version provides a unique identifier of a complete environment, making it citable and reproducible. Each release is tested automatically in a continuous integration pipeline to ensure support for multiple Python versions and operating systems.

        In summary, the neuro-conda distribution bundles commonly used neuroscience packages into curated conda environments, which are rigorously tested and validated for consistency and reliability.

        Speaker: Stefan Fuertinger (Ernst Strüngmann Institute (ESI) gGmbH for Neuroscience in Cooperation with Max Planck Society)
      • 19:40
        NFDIxCS Creating a Research Data Management Container (RDMC) 20m

        Effective management of research data and software is essential for promoting open and trustworthy research. Structured methods are needed to ensure that research artifacts remain accessible and easy to locate, in line with the FAIR principles of making research data and software findable, accessible, interoperable, and reusable [1, 2]. However, fully implementing these principles remains challenging.
        Several research data management initiatives, such as the National Research Data Infrastructure (NFDI) and the European Open Science Cloud (EOSC), aim to support a cultural shift towards openness. The NFDIxCS consortium [3], part of the NFDI, has a core mission to develop infrastructure that supports operational services across the diverse Computer Science (CS) field and implement FAIR principles. A central concept of this project is the Research Data Management Container (RDMC) [4], which encapsulates research data, software, and contextual information into a 'time capsule' for long-term archiving and future use. After creating an RDMC, this container will be connected to a Reusable Execution Environment (REE), allowing the time capsule to be unpacked and executed within a predefined environment.
        Creating an RDMC requires a workflow to encapsulate research data, software, its external components, the context, and other related materials into a single container. Based on several personas [5], we have developed this workflow and designed a wizard to facilitate this process. In this demo, we will showcase the creation process of the RDMC, explain its features, discuss the challenges encountered during development, and outline plans for future work.

        References

        1. Wilkinson, M. D. et al.: The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 1/3, p. 160018, 2016. DOI: 10.1038/sdata.2016.18
        2. Chue Hong, N. P. et al.: FAIR Principles for Research Software (FAIR4RS Principles). Research Data Alliance, 2022. DOI:
          10.15497/RDA00068
        3. Goedicke, M. et al.: National Research Data Infrastructure for and with Computer Science (NFDIxCS). Zenodo, 2024. DOI: 10.5281/zenodo.10557968
        4. Goedicke, M.; Lucke, U.: Research Data Management in Computer Science - NFDIxCS Approach. Gesellschaft für Informatik, Bonn, 2022. DOI: 10.18420/inf2022_112
        5. Bernoth, J.; Al Laban, F.; Lucke, U.: Utilizing Personas to Create Infrastructures for Research Data and Software Management. Gesellschaft für Informatik e.V, 2024. DOI: 10.18420/INF2024_180
        Speaker: Safial Islam Ayon (Universität Potsdam)
      • 19:40
        Open research software infrastructure in Neuro-Medicine 20m

        The Institute of Neuroscience and Medicine: Brain and Behavior (INM-7) at the research center Jülich combines clinical science with open source software development in different areas: Individual groups independently develop open software tools for data and reproducibility management (DataLad; https://datalad.org; Halchenko et al. 2021), mobile health applications (JTrack; https://jtrack.readthedocs.io; Sahandi Far et al., 2021), and machine-learning libraries (JuLearn; https://juaml.github.io/julearn; Hamdan et al., 2024). In a collaborative platform for digital medicine in North-Rhine-Westphalia, we now connect the distinct software tools with the aim to establish an integrated, user-friendly, and FAIR infrastructure for digital biomarker collection, storage, and exchange for clinical scientists. In this contribution, I want to map out the different challenges and opportunities in plugging together open research infrastructure from several unrelated but open source software components. In addition, beyond an overview of our tools and projects, I also aim to spark discussions around synergies and interoperability with related software projects in medical contexts.

        Halchenko, Yaroslav, et al. "DataLad: distributed system for joint management of code, data, and their relationship." Journal of Open Source Software 6.63 (2021).
        JTrack: A Digital Biomarker Platform for Remote Monitoring of Daily-Life Behaviour in Health and Disease.
        Sahandi Far M, Stolz M, Fischer JM, Eickhoff SB, Dukart J.
        Front Public Health. 2021 Nov 19;9:763621. doi: 10.3389/fpubh.2021.763621. eCollection 2021.
        Hamdan, Sami, et al. "Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models." Gigabyte 2024 (2024).

        Speaker: Dr Adina Svenja Wagner (INM-7)
      • 19:40
        OpenLB – Open Source Lattice Boltzmann Code 20m

        OpenLB is one of the leading open source software projects for Lattice Boltzmann Method (LBM) based simulations in computational fluid dynamics and beyond. Developed since 2007 by an interdisciplinary and international community, it not only provides a flexible framework for implementing novel LBM schemes but also contains a large collection of academic and advanced engineering examples. It runs efficiently on target platforms ranging from smartphones over multi-GPU workstations up to supercomputers.

        This poster will give an overview of the project and its community, its many multi-physics applications as well as performance benchmarks and continuous development process.

        Speaker: Adrian Kummerlander (KIT)
      • 19:40
        Porting the hydrologic model ParFlow to different accelerator architectures using eDSL and Kokkos 20m

        The ParFlow hydrologic model is an integrated variably saturated groundwater, surface
        water flow simulator that incorporates subsurface energy transport and land surface
        processes through the integration of the Common Land Model (CLM) as a module. In addition ParFlow has been coupled to atmospheric models, such as WRF, COSMO and ICON. ParFlow is also integrated in the German climate and weather prediction ICON(-Land) software ecosystem as part of the WarmWorld project and it is an important component of the Terrestrial Systems Modeling Platform (TSMP), which enables integrated simulations from the bedrock, across the land surface to the top of the atmosphere with the coupled ICON/COSMO-CLM-ParFlow modeling system.

        ParFlow is written in C (with additional Fortran 90 parts (especially if CLM is enabled)) and uses the parallelization methods MPI, OpenMP, native CUDA support and the programming model Kokkos (with the backend CUDA, HIP or OpenMP). As a matter of fact, the parallelism in ParFlow has been abstracted early on in what is called an embedded Domain Specific Language (eDSL) which is leading to a best-practice separation-of-concerns, which means the domain scientist/developer does not see e.g. a single MPI call when programming in ParFlow.

        Since future hardware will be characterized by varying architectures there was a demand to also enable HIP for AMD architectures. For example, the pre-exascale HPC system LUMI of EuroHPC is based on an accelerator consisting of AMD GPUs. To implement HIP for ParFlow via the eDSL and Kokkos (Cuda was already implemented) the porting work started in December 2022 in a sprint of the national Earth System Modelling Initiative (natESM) and was continued and finished at IBG3 (Forschungszentrum Jülich). Based on the eDSL the AMD porting was done using Kokkos because the Kokkos ecosystem already includes an implementation of the HIP programming model for AMD GPUs and which resulted for ParFlow in a high degree of performance portability. Performance and scalability have been demonstrated on JUWELS Booster (Jülich Supercomputing Centre) for Nvidia GPUs (Nvidia A100) and also on the LUMI supercomputer at CSC (Finland) with AMD MI250X accelerators.

        In this poster we will present the eDSL of ParFlow and also how Kokkos (with the Cuda and HIP backend) is included in its eDSL and to allow ParFlow to reach performance portability on the basis of the eDSL and Kokkos, especially for Nvidia (Cuda) and on AMD platforms (HIP) with the change of only a limited amount of lines. We will also present scaling plots for different machines and accelerators and how to enable HIP as a Kokkos backend of ParFlow using the eDSL of ParFlow.

        Speaker: Joerg Benke
      • 19:40
        Poster to MAUS, #36, Machine-AUtomated Support for Software Management Plans 20m

        an abstract for the poster to their demo.

        Speaker: David Walter
      • 19:40
        Pydidas - A modular framework for diffraction data analysis: The road towards FAIR software 20m

        Helmholtz-Zentrum Hereon operates multiple X-ray diffraction (XRD) experiments for external users and while the experiments are very similar, their analysis is not. The variety in data analysis workflows is challenging for creating FAIR analysis workflows because a lot of the analysis is traditionally done with small scripts and not necessarily easily reproducible.
        Pydidas [1, 2] is a software package developed for the batch analysis of X-ray diffraction data. It is published as open source and intended to be widely reusable. Because the wide range of scientific questions tackled with the technique of XRD, a limited number of generic tools will not be sufficient to allow all possible analysis workflows. Easy extensibility of the core analysis routines is a key requirement. A framework for creating plugin-based workflows was developed and integrated in the pydidas software package to accommodate different analytical workflows in one software tool.
        Plugins are straightforward in their design to allow users/collaborators to extend the standard pydidas plugin library with tailor-made solutions for their analysis requirements. Access to plugins is handled through a registry which automatically finds plugins in specified locations to allow for easy integration of custom plugins. Pydidas also includes (graphical) tools for creating and modifying workflows and for configuring plugins, as well as for running the resulting workflows.
        While pydidas was develop with the analysis of X-ray diffraction data in mind and the existing generic analysis plugins reflect this field, the architecture itself is very versatile and can easily be re-used for different research techniques.

        [1] https://pydidas.hereon.de
        [2] https://github.com/hereon-GEMS/pydidas

        Speaker: Malte Storm (Helmholtz-Zentrum Hereon)
      • 19:40
        Python Interface for Particle Accelerator Control and Modelling 20m

        Particle accelerators are widely used around the world for both research and industrial purposes. The largest facilities consist of synchrotron light sources, high energy physics colliders and nuclear physics research facilities. These are essential tools for scientists in a broad range of fields from life sciences to cultural heritage and engineering, and use significant national or international investments for their construction and operation. To ensure reliable operation and that the accelerators meet the performance goals set by the users, it is crucial to have up-to-date, well-maintained software which allows the accelerator physicists to easily interact with the machine and relate its behaviour to a machine model.

        Presently, Matlab Middle Layer (MML), initiated in the 1990s, is a key tool at many synchrotron light sources for commissioning, operation and accelerator tuning. While widely adopted, MML has over time become outdated and fragmented, making it both difficult to extend and to maintain. In addition, Matlab is a propriety software with decreasing user base among students and young professionals in the accelerator physics field, highlighting the need for a new modern, open-source solution which can meet the requirements of the future.

        Therefore, an initiative for a world-wide collaboration is underway to develop a Python Accelerator Middle Layer (pyAML). Some key requirements for the pyAML are control system agnosticism, machine independence, easy integration with already existing Python packages allowing for use of high performance computing, modern optimisation algorithms and machine learning, connection to a digital twin, FAIR data management, and a software architecture which makes it possible to maintain the code in a collaborative way while individual facilities can contribute developments that are directly usable at other facilities. In addition, the pyAML collaboration aims to strengthen collaboration, offer training and in general improve the software development skills in the community.

        The work presented here details the contribution of Helmholtz-Zentrum Berlin (HZB) and Karlsruhe Institute of Technology (KIT) to the broader effort. Our goal is to make use of the experienced we have gained from software development for our two facilities and be active participants in building the foundation for pyAML together with other facilities around the world.

        Speaker: Teresia Olsson (Helmholtz-Zentrum Berlin)
      • 19:40
        Research Software Engineering in the NFDI 20m

        Research Software Engineering is fundamental to the German National Research Data Infrastructure (NFDI). Following that, a "deRSE Arbeitskreis NFDI" serves as a connection point for RSEs in the NFDI inside deRSE e.V.

        Within the NFDI e.V., several "sections" are dealing with overarching topics, e.g., the "Sektion Common Infrastructures" with its working groups on "Data Integration (DI)", "Data Management Planning (DMP)", "Data Science and Artificial Intelligence (DSAI)", "Electronic Lab Notebooks (ELN)", "Persistent Identifiers (PID)" and "Research Software Engineering (RSE)".

        The RSE working group connects the NFDI consortia in software-related aspects. It focuses on three areas: Research software, software communities and software infrastructure at NFDI. The working group operates a central forum in an advisory and supportive capacity. It establishes the necessary software ecosystem within NFDI for the professional development of software infrastructure components, which represent an integral part of the NFDI. In addition, the working group serves as an interface for the NFDI to compare European and international initiatives to promote the connectivity of the NFDI with other infrastructures.

        This poster, as a meet-up place, would like to bring RSEs already working or interested in the NFDI together to create an active network and encourage them to join the RSE working group in the NFDI Section Infra.

        Speakers: Florian Thiery (CAA e.V.), Bernd Flemisch (University of Stuttgart)
      • 19:40
        Software Development Processes for Optimizing Academic Research Software Through CI/CD and Web Applications 20m

        Background:
        Research associates at our institute frequently develop methods for investigating building
        systems and indoor climate technology. While these researchers excel in their domains and
        create valuable computational methods, they often lack formal software development
        training. This leads to challenges in code maintainability and accessibility, particularly when
        sharing research outputs with stakeholders outside academia or attempting cross-
        institutional collaboration.
        Challenges:
        Two primary challenges emerge: First, the complexity of research code makes it difficult for
        decision-makers and practitioners to utilize the developed methods directly. Here, web-based
        frontends are a promising option to let users understand the research in an interactive
        manner. Second, the varying programming expertise among researchers often results in code
        that doesn't meet the quality standards required for open-source development and
        collaboration with other institutes. To address this, CI/CD pipelines are helpful.
        Approach:
        Our institute's software development team addresses these challenges through a dual
        approach. They develop web applications to make research methodologies and results
        accessible to the public while simultaneously reviewing and improving researchers' code
        bases. This includes implementing better development practices and establishing proper
        software engineering processes.
        Implementation:
        To streamline collaboration between software developers and researchers, we developed a
        requirements web application that helps researchers define technical specifications at project
        inception. This tool bridges the knowledge gap between domain experts and software
        developers, reducing iterative cycles in application development. Additionally, once
        researchers start developing their methods, we support this development with extensive
        CI/CD pipelines. Herein, we established a Kubernetes cluster hosting a scalable GitLab runner,
        providing centralized continuous integration capabilities for all software projects.
        Results:
        This structured approach has significantly improved both the accessibility of research outputs
        and the quality of research software. The requirements web application has streamlined the
        development process, while our CI/CD infrastructure ensures consistent code quality across
        projects. This framework enables effective collaboration between researchers and software
        developers, despite their different technical backgrounds.

        Speaker: David Jansen (RWTH Aachen University, E.ON Energy Research Center, Institute for Energy Efficient Buildings and Indoor Climate)
      • 19:40
        Squirrels und Unicorns - community-driven grassroots RSE-Lösungen zur FAIRification aus den Humanities & Geosciences 20m

        Die nachvollziehbare sowie kolloberative Erfassung und FAIRifizierung von Forschungsdaten wird in der Citizen Science Community immer wichtiger, um so ein Teil eines z.B. archäologischen Wissensgraphen zu werden und das bereits vernetzte Datennetzwerk durch qualifizierte Daten anzureichern. Nur so können diese Daten auch mit anderen Daten verknüpft werden und in internationale Initiativen wie NFDI4Objects und Community-Hubs (z.B. Wikidata, OpenStreetMap) aktiv eingebunden werden. Dabei sind Open Source (FOSS) Research und FAIRification Tools leider oft nicht verfügbar. Diese, im Zusammenwirken mit Linked Open Data Projekten als Demonstratoren, können von Community- und Volunteer-Initiativen wie z.B. dem Research Squirrel Engineers Network jedoch erstellt und kuratiert werden.

        Das Research Squirrel Engineers Network (gegründet 2019 zur Implementierung des SPARQL Unicorns) ist ein loser Zusammenschluss von Linked Open Data/Wikidata-Enthusias*Innen, Research Software Engineers und Citizen Scientists mit den Schwerpunkten Archäoinformatik, Digital Humanities und Geoinformatik. Die Mitglieder entwickeln und maintainen zusammen Research und FAIRification Tools und setzen diese in konkreten Projekten um.

        Ein FAIRification Tool für das digitale Datenmanagement ist das SPARQL Unicorn und dessen Implementierung für QGIS [1]. Das SPARQLing Unicorn QGIS Plugin ermöglicht es, Linked-Data-Anfragen in (Geo)SPARQL an Triple-Stores zu senden und bereitet die Ergebnisse für die Geocommunity in QGIS auf. Es bietet derzeit drei Hauptfunktionen: (A) Vereinfachte Abfrage von Semantic Web Datenquellen, (B) Transformation von QGIS-Vektorebenen nach RDF und (C) Dokumentation von LOD als HTML-Seite. Darüber hinaus ermöglicht das SPARQL Unicorn Ontology Documentation Tool [2] z.B. per GitHub Action die automatisierte Erstellung von HTML Seiten von Linked Open Data Veröffentlichungen in anderen Anwendnungen.

        Beispiele hierfür sind irische Ogham Steine/Sites auf der Dingle-Halbinsel, Fundorte des Campanian Ifnimbrite, sowie Daten aus dem Dissertationsprojekt von Sophie C. Schmidt zu “Brandenburg 5.000 v. Chr.” indem ein CIDOC CRM Datenmodell in Linked Open Data überführt und mit Hilfe des SPARQL Unicorns als HTML visualisiert wurde.

        Dieses Poster stellt die Research Squirrel Engineers Network Initiative, Research und FAIRification Tools sowie Projekte der Research Squirrels vor. Es zeigt den Einsatz des SPARQLing Unicorn QGIS Plugin im Bereich des raumbezogenen Forschungsdatenmanagement mit Hilfe von Linked Open Data aus den Bereichen Archäologie und den Geowissenschaften.

        [1] https://github.com/sparqlunicorn/sparqlunicornGoesGIS
        [2] https://github.com/sparqlunicorn/sparqlunicornGoesGIS-ontdoc

        Speaker: Florian Thiery (Research Squirrel Engineers Network)
      • 19:40
        Strengthening the Traceability and Transparency of the Software Development and Management Lifecycle Using Knowledge Pool 20m

        The model proposed in this study aims to prevent the loss of key elements within the Scrum framework, commonly used in software development and management processes, and to facilitate their reuse. Software developers handle numerous tasks, and over time, these tasks are completed. New tasks arise, while existing tasks accumulate issues (bugs) or performance improvements. When this historical information is forgotten or an employee leaves, it becomes challenging to accurately assess the duration and complexity of new tasks.

        To address this, this study proposes a large language model that learns task context and the Scrum process, leveraging past data on employees and tasks to accurately assign story points for new tasks and allocate them to the appropriate team members. By understanding both task context and process flow, the model aims to enhance the accuracy of task estimation and team member assignment, promoting effective and informed resource allocation within Scrum teams. For this work, the dataset was obtained from several open source projects and their software management tools.

        Speaker: Mr Oguzhan Oktay Buyuk (Niedersächsische Staats- und Universitätsbibliothek Göttingen)
      • 19:40
        Suchen und Finden von taxonomischen Daten in digitalen Katalogen naturhistorischer Sammlungen: Werkzeuge und Strategien 20m

        Immer mehr naturhistorische Sammlungen spielen ihre Daten über ihre Sammlungsobjekte in digitalen Katalogen oder Portalen aus. Zu diesen Daten gehören z.B. taxonomische Angaben wie Art, Gattung usw. oder Angaben zum Fundort oder zu Personen. Diese Daten sind für Wissenschaftler und für die breite Öffentlichkeit gleichermaßen von Interesse. Allerdings sind insbesondere taxonomische Angaben aus unterschiedlichen Gründen unvollständig. Das kann eine Suche in den digitalen Katalogen erschweren und im schlechtesten Fall sogar einen Fund verhindern.

        Der Taxon Finder als Teil einer Werkzeug-Suite von Webdiensten kann dieses Problem lösen, indem er taxonomische Namen und Synonyme in einschlägigen Datenbeständen durchsucht (auch mittels Fuzzy-Suche) und die dort vorhanden Informationen wie die vollständige taxonomische Abstammungslinie, umgangssprachliche Namen und URIs zu weiterführenden Informationsangeboten zur Verfügung stellt. Diese Informationen können bei Bedarf mit weiteren Daten von Wikidata angereichert werden.

        So wird aus der einzigen Angabe der Art Monodonta dama die Abstammungslinie Animalia, Mollusca, Gastropoda, Trochida, Trochidae, Monodonta, Monodonta nebulosa ermittelt. Zusätzlich ist nun die Information verfügbar, dass Monodonta dama ein Synonym ist und es sich bei Trochidae um Kreiselschnecken handelt. Werden diese Informationen in den digitalen Katalog übernommen, so kann dies die Sucherfahrung deutlich verbessern und dabei helfen die sprichwörtliche Nadel im Heuhaufen zu finden. Darüber hinaus sind diese Angaben unerlässlich für weitere Informationsangebote wie Dashboards, z.B. um die Diversität einer Sammlung zu visualisieren.

        Die Werkzeug-Suite kann on-premise und offline betrieben werden. Dazu werden frei verfügbare Datenbestände (Open Tree of Life, Catalogue of Life, GBIF Backbone Taxonomy) in einer gemeinsamen Graphen-Datenbank importiert, für eine effiziente Suche indiziert und über eine REST-Schnittstelle zur Verfügung gestellt. Das macht sie ideal für den Einsatz in Szenarien, in denen viele Anfragen schnell abgearbeitet werden müssen oder Bandbreite ein Problem darstellt. Für explorative Anfragen stellt jeder Webdienst eine Bedienoberfläche zur Verfügung.

        Vorgestellt werden die komplette Werkzeug-Suite und zwei konkrete Anwendungsfälle, die die Nutzung und den Mehrwert der Suite verdeutlichen. Ebenfalls wird die prototypische Integration in den digitalen Katalog gezeigt.

        Speaker: Jens Dobberthin (Museum für Naturkunde Berlin)
      • 19:40
        The emergence of a new support role: Service Stewards in Research Software Engineering 20m

        The German National Research Data Infrastructure (NFDI) and its Base4NFDI initiative have introduced the role of Service Stewards to drive the development and integration of NFDI-wide basic services. They support the service developer teams, acting as a crucial interface connecting the teams and NFDI consortia and ensuring basic services are known and meet the communities’ needs. Especially when generic research software is developed and multiple stakeholders are involved, the process benefits from their interdisciplinary role.

        Since March 2023, NFDI comprises 26 discipline-specific consortia. In general, these scientific communities operate their own service portfolios, i.e. software tools, technical services or workflows, and offer advice to enable and facilitate researchers to create and work with FAIR data. However, cross-disciplinary services are necessary to address overarching needs such as federated identity management (IAM4NFDI), persistent identifiers (PID4NFDI), and knowledge graphs (KGI4NFDI). This is where Base4NFDI comes into play: as a joint initiative of all NFDI consortia, the project promotes these basic services for a NFDI-wide service portfolio, providing financial funds for personnel and building the framework for accompanying the service development process.
        Basic services usually combine existing solutions to aim for a sustainable, long-term technical and organisational service provision. They may offer new software, processes and workflows, computing and storage resources and personnel for development and user support. As cross-disciplinary solutions, basic services can be quite generic. All basic services are required to add value to the entire NFDI and ensure interoperability on an (inter)national level. They also need to be backed up by a sustainable operating model, obligating partner institutions and developing a financial plan.

        In the development process of these services, research software engineers (RSEs) as well as Service Stewards play a key role. Tasks of RSEs include adapting a service and associated software to the needs of the consortia, combining their technology with other basic services, and incorporating FAIR principles and the open science approach.
        Service Stewards support the work of the basic service teams by acting as a communication interface between all involved stakeholders. They support the bottom-up approach by requirement engineering and the continuous analyses of community needs in order to ensure a high level of acceptance and usability among the various target groups of the services. Regarding coordination and implementation, they are a valuable resource especially in multi-stakeholder processes, being able to represent different interest groups and maintaining the flow of information as well as outreach activities. RSEs can therefore benefit from their support and networking with funding organisations and scientific communities. The emergence of NFDI basic services, combined with a requirement-driven development process that fosters communication among stakeholders as part of a broader cultural shift in science, has the potential to promote and strengthen Research Software Engineering as a scientific discipline, further enhancing its role in research.

        Join us to discuss the partnership between Service Stewards and RSEs and the potential to foster not only NFDI’s basic service development but also for other multi-stakeholder development processes.

        Speakers: Jana Tatschek (ZPID - Leibniz-Institut für Psychologie), Sandra Zaenkert (ZB Med - Informationszentrum Lebenswissenschaften | Base4NFDI)
      • 19:40
        The OpenQDA Project Live DEMO 20m

        The qualitative data analysis software OpenQDA¹ is already available as a free public Beta for anyone to use. In this DEMO we will to showcase the upcoming 1.0 release with a real-world live coding, involving an entirely redesigned user-interface, as well as a set of fundamental AI-plugins for preparation, coding and data analysis.

        1 https://github.com/openqda

        Speakers: Jan Kuester (University of Bremen), Prof. Karsten D. Wolf (University of Bremen)
      • 19:40
        The Platform MaterialDigital Workflow Store - Sharing Scientific Workflows within the MSE community 20m

        The MaterialDigital Platform (PMD) project, launched in 2019, aims to advance digitalization in material science and engineering in Germany. The project focuses on creating a robust infrastructure for managing and sharing material-related data.

        The PMD Workflow Store is a key component of this initiative. It serves as a repository where scientists and engineers can access, collaborate on, and upload workflows for material simulation and analysis. The platform features centralized access and sharing, automated validation, a searchable index, and a commitment to continuous enhancement based on user feedback.

        Here, we introduce the PMD Workflow Store and demonstrate how it streamlines collaboration and knowledge sharing within the scientific community. We show that this platform simplifies access to diverse workflows, ensuring high standards of quality and consistency. The Workflow Store accelerates research and development by providing a single entrypoint to discover and share workflows and workflow modules from and with the MSE community.

        We highlight the transformative impact of the PMD Workflow Store on the material science community. This platform not only meets current needs but also aims to incorporate tools for future advancements. As digital workflows become indispensable in material science, the PMD Workflow Store aims to become a vital resource for researchers and engineers.

        Speakers: Mr Artem Buldin (Karlsruhe Institute of Technology (KIT)), Ms Jehona Kryeziu (Karlsruhe Institute of Technology (KIT))
      • 19:40
        Towards Defining Lifecycles and Categories of Research Software 20m

        There is a large variety of types of research software at different stages of evolution. Due to the nature of research and its software, existing models from software engineering often do not cover the unique needs of RSE projects. This lack of clear models can confuse potential software users, developers, funders, and other stakeholders who need to understand the state of a particular software project, such as when deciding to use it, contribute to it, or fund it. We present work performed by a group consisting of both software engineering researchers (SERs) and research software engineers (RSEs), who met at a Dagstuhl seminar, to collaborate on these ideas.

        Through our collaboration, we found many of our terminologies and definitions often vary, for example one person may consider a software project to be early-stage or in maintenance mode, whilst another person might consider the same software to be inactive or failed. Because of this, we explored concepts such as software maturity, intended audience, and intended future use. In this poster, we will present a working categorization of research software types, as well as an abstract software lifecycle that can be applied and customized to suit a wide variety of research software types. Such a model can be used to make decisions and guide development standards that may vary by stage and by team. We also are seeking community input on improvements of these two artifacts for future iterations.

        Speakers: Michael Goedicke, Daniel S. Katz, Prof. Bernhard Rumpe (RWTH Aachen)
      • 19:40
        What makes computational communication science (ir)reproducible? 20m

        Computational methods are in full swing in communication science. Part of their promise is to make communication research more reproducible. However, how this plays out in practice has not been systematically studied. We verify the reproducibility of the entire cohort of 30 substantive and methods papers published in the journal Computational Communication Research (CCR), the official journal of the International Communication Association Computational Methods Division with a focus on transparency and hence a high rate of voluntary Open Science participation in the field. Among these CCR papers, we are not able to verify the computational reproducibility of 16 papers as no data and/or code were shared. For the remaining 14 papers, we attempt to execute the code shared by the original authors in a standardized containerized computational environment. We encounter a variety of issues that preclude us from reproducing the original findings, where incomplete sharing of data or code is the most common issue. In the end, we could at least partially reproduce the findings in only 6 papers (20%). Based on our findings, we discuss strategies for researchers to correct for this disheartening state of computational reproducibility. We emphasize the lack of computational reproducibility as a socio-technical challenge and at least for this case the social part was the main culprit. To speak to research software engineers, we warn against the "Technological Solutionism" approach to support computational reproducibility.

        Speaker: Chung-hong Chan (GESIS)
        doi
      • 19:40
        Who Do You Think You Are? Identifying Research Software Engineering Personas From Developer / Repository Interaction Data 20m

        ‘Personas’ are widely used within traditional software contexts during design to represent groups of users or developer types by generating descriptive concepts from user data.

        ‘Social coding’ practices and version control ‘code forges’ including GitHub allow fine-grained exploration of developers’ coding behaviours though analysis of commit data and usage of repository and development management features such as issue tickets.

        By combining software repository mining techniques with persona concepts, we have generated a novel taxonomy of Research Software Engineering Personas (RSE Personas).

        This gives us insight into collaborative development best practices and helps represent developer-repository interactions. This work has initially been done on a dataset gathered from 10, 000 research software GitHub repositories which have been deposited with Zenodo.

        RSE Personas identify distinct groupings of development behaviours by applying clustering analysis to data from larger collaborative research software project repositories. Correlations between groups of ‘best practice’ behaviours and common development and repository management activities are examined.

        This poster explains the RSE Personas methodology, describes important Personas and their properties, shows key emerging findings at developer / repository levels, and explores future applications and potential caveats for this novel method.

        Classification methods from Open Source Software research such as commit message keyword classification [1] and commit file type classification [2] are combined with factors such as review of pull requests, issue tickets, commit size and frequency data, commit ‘activity types’ (for example: coding new features, versus documenting or managing the repository) and other contributions information from GitHub to build a picture of developers’ engagement with their repositories.

        An example RSE Persona identified is the “Active Leader”: responsible for a high proportion of commits, frequently assigned to issue tickets and pull request reviews; they modify files across the entire codebase, with strong contributions to activities such as developer documentation.

        Personas demystify subtle interactions between researchers and their code, unlocking insights into the day-to-day behaviours of RSEs and the different contributions they make to their projects, and how those managing such projects could identify ways to better support their teams towards effective research software development.

        Personas could allow RSEs to interact more effectively by understanding their current practices in relation to their teams and communities, helping them identify ‘next steps best practices’, boosting their professional - as well as software - development.

        We agree that while RSEs are certainly far more than their code and the digital footprints they leave on their repositories, the Research Software Engineering Personas methodology now allows us to describe, explore and investigate the current real-world practices of contributors to research software, and we invite you to engage with our work.

        References:
        [1] L. P. Hattori and M. Lanza, ‘On the nature of commits’, in 2008 23rd IEEE/ACM International Conference on Automated Software Engineering - Workshops, Sep. 2008, pp. 63–71. DOI: 10.1109/ASEW.2008.4686322.

        [2] B. Vasilescu, A. Serebrenik, M. Goeminne, and T. Mens, ‘On the variation and specialisation of workload—A case study of the Gnome ecosystem community’, Empir Software Eng, vol. 19, no. 4, pp. 955–1008, Aug. 2014, DOI: 10.1007/s10664-013-9244-1.

        Speaker: Felicity Anderson (EPCC, University of Edinburgh)
      • 19:40
        WIAS-PDELib: A Julia PDE solver ecosystem in a GitHub organization 20m

        We present the GitHub organization WIAS-PDELib, which provides an
        ecosystem of free and open source solvers for nonlinear systems of PDEs written in Julia.
        WIAS-PDELib is a collection of a finite element package (ExtendableFEM.jl), a finite volume package (VoronoiFVM.jl), as well as grid managers (e.g., ExtendableGrids.jl) and other related tools for grid generation and visualization.
        The ecosystem is characterized by sharing a common data structure for grids and strives to be compatible with the SciML package ecosystem concerning solution objects and solver calls. The nonlinear PDE solvers rely on automatic differentiation for the calculation of Jacobians.
        A GitHub organization allows to define different fine-grained maintainer roles so that each package can have its own experts responsible for internals and integrity.

        At WIAS Berlin, we have a long history of solver development for PDEs.
        The new WIAS-PDELib in Julia builds upon the experience from developing a feature-rich C++ library, PDELib2, which has been used in a number of WIAS application projects.
        However, the deployment of the code, the build process, and the adaptability to user extensions were naturally difficult. This is due to the maintenance burden for build systems and interoperability with scripting languages such as Python.
        Julia's built-in package management overcomes most of these difficulties.
        In addition, it provides built-in reproducibility of exact versions of all code dependencies, as well as portability to all major operating systems.
        Another advantage of the Julia language is the easy way to write generic yet efficient code, allowing, for example, our PDE solvers to be composed with the ODE solvers available from the SciML organization.

        The PDE solvers are used in a number of scientific projects at WIAS and beyond in areas such as semiconductor modeling, (bio)electrochemistry, porous media flow and catalysis.

        Speaker: Dr Patrick Jaap (WIAS Berlin)
    • 18:00 22:00
      Reception: Joint Reception of SE and deRSE Audimax Foyer

      Audimax Foyer

      Building 30.95

      Str. am Forum 1, 76131 Karlsruhe
    • 18:00 19:00
      SE Student Research Competition: Posters Audimax Foyer

      Audimax Foyer

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      • 18:00
        Student Research Competition: Posters 1h
    • 09:00 10:30
      Local RSE Units or groups SR A+B

      SR A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Florian Goth (Universität Würzburg)
      • 09:00
        Software Coupling shaped by Organisational Needs in Interdisciplinary Research 20m

        For interdisciplinary research, software engineering has to take into consideration the different scientific perspectives on interacting processes, non-matching terminologies and the coordination of research teams from multiple institutions. This contribution presents an example from the field of water quality modelling in rivers, that requires the coupling of a complex biological model to a physical transport model in flowing water in order to simulate the spatio-temporal evolution of concentrations of ecologically important substances.

        After meanwhile 40 years of development of the water quality model QSim, including the transition from one-dimensional to multi-dimensional approaches and coupling to a range of physical transport models, the choices on software design can be reviewed: While offline coupling turned out to be crucial for the success of the development due to limitations in both computing speed and development resources, non-standard interfaces and data formats, unsystematic data structures and lack of modularity proved to be a serious obstacle to further development of this legacy code. The integration into larger open-source software communities was started and yield promising results.

        An example is given, why interdisciplinary collaboration needs to be organised not only during coding and testing but also in the application of coupled software. Software design must suit the fact, that necessary expertise is not always available at the same place at the same time. Different teams or different modes of collaboration might require different coupling options. Whether software development requires larger and ample interdisciplinary teams can be discussed.

        Speaker: Jens Wyrwa (Bundesanstalt für Gewässerkunde , Koblenz, BfG)
      • 09:20
        Establishing central RSE units in German research institutions 20m

        What defines an "RSE unit"? Where does it fit into the German academic research environment? What are typical tasks of such a unit? What could its structure look like? What are typical challenges the units face?
        These are only some of the questions that the upcoming (at the time of talk submission) paper with the same name (https://github.com/DE-RSE/2023_paper-RSE-groups) addresses and which we present in this talk: not focusing on individual or specialised RSE groups, but central RSE service units.

        Speaker: Frank Loeffler (Friedrich Schiller University Jena)
      • 09:40
        Better research software through better research software competencies - a symposium report 20m

        In December 2024, roughly 35 members of the German RSE community will follow an invitation by the VolkswagenStiftung to Hannover for a symposium on "Code for Science or: Better Research Software through better research software competencies". The symposium is co-located with three other symposia as part of a larger event on "Digital Competenencies in the Academic System". Our idea is to bring together a large variety of stakeholders in research software competencies: the German RSE community, libraries, computing centers, research data management facilities, HPC facilities, Open Science offices and training institutions. In a series of open discussion formats, this diverse group will then identify common problems in research software competencies, develop ideas to target these problems and make plans on how these ideas can be implemented. After the end of the symposium, our findings will be published in a report. We present those findings in our talk and hope for community approval and lively discussions.

        Speaker: Dominic Kempf (Scientific Software Center, Heidelberg University)
      • 10:00
        A research software engineering department at a German university: Four years and counting 20m

        The Scientific Software Center (SSC), established in fall 2020 at the Interdisciplinary Centre for Scientific Computing of the University of Heidelberg, provides institutional support to researchers of all faculties in software development and software engineering best practices. The SSC promotes reproducibility and sustainability of research software. The support offered by the SSC is based on the three pillars “Development and Sustainability”, “Teaching and Consultation”, and “Outreach and Communication”1.
        In this contribution, we want to share success stories and roadblocks of the first four years of the SSC. The presentation will touch upon the following main aspects: (1) Aspects related to research software engineering; (2) management of RSE groups; (3) administrative aspects; (4) process and people; (5) training and transfer. The SSC employs predominantly “generalist” but also “embedded” research software engineers, where generalists work on RSE projects of all disciplines, and embedded research software engineers are focused on RSE projects in their home domain. Often, the research software engineers work on multiple projects, making effective project and time management essential. Being located at a German university brings its own additional administrative challenges related to the recognition of RSE as academic discipline and academic work contracts. Establishing processes and communication channels is an ongoing effort at the SSC, with the aim to manage time and quality effectively and transparently, while considering mindset, ownership, mental health and personal growth. Transporting the ideals of software engineering best practices into research groups through training and collaboration is another challenge, that on the one hand is met with great enthusiasm, on the other hand with reluctance to deviate from established routines. We will conclude with an outlook and a summary of (anticipated) differences of an RSE group with “traditional” research groups at academic institutions.

        1 Keegan, L., Kempf, D., Ulusoy, I. (2024). Scientific Software Center at Heidelberg University: White Paper (v2.0). Zenodo. [https://doi.org/10.5281/zenodo.10867903]

        Speaker: Inga Ulusoy (University of Heidelberg)
    • 09:00 10:30
      RSE research Audimax A

      Audimax A

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Anna-Lena Lamprecht (Universität Potsdam, Institut für Informatik und Computational Science)
      • 09:00
        Quality Characteristics for Software in HPC Environments: A Systematic Literature Review 20m

        Research outputs in general require certain qualities to facilitate reuse as described by the FAIR Principles. For research software specifically, software engineering methods can help realize these goals. However, the desired qualities may differ between commercial and research software or even software in HPC environments. The focus on performance introduces challenges such as additional complexity from parallelization and hardware-specific implementations, which influence software quality.

        This work aims to analyze the current research on software quality in HPC and, in particular, identify important quality characteristics. Therefore, we conducted a systematic literature review, which resulted in 29 relevant papers.

        We find that the topic has been researched, especially in the last ten years. The contributions can be categorized into three areas: the proposal of a tool or process for improving software quality in HPC, the presentation of an HPC software including a description of how software quality is approached there, and the analysis of software quality, for example, the current state or the impact of certain factors.

        Analyses of the quality characteristics indicate performance, portability and correctness as the most frequently discussed quality attributes, alongside various aspects of maintainability, usability and reliability. We will further refine these findings and compare them with the established ISO/IEC 25010 SQuaRE (Systems and software Quality Requirements and Evaluation) software quality model.

        The insights from this study can be used to provide research software engineers in HPC with a starting point on quality aspects to consider in their applications. Additionally, our findings identify gaps where suitable tools, practices or metrics for measuring or improving certain qualities are missing and offer directions for future research and tool development.

        Speaker: Ms Camilla Lummerzheim (RWTH Aachen University)
      • 09:20
        Towards Services to Enable FAIR Research Software in a Typical Research Project Cycle 20m

        In a digitalized world, the use and development of research software is fundamental for research. Reusing research software can improve the quality and efficiency of research. Therefore, Chue Hong et al. defined the FAIR principles for research software [1] which describe how FAIR search software looks like. Ideally, making research software FAIR is not the last step in the research process. But what does it take to consider the FAIR principles for research software during the whole research project? What standards should be adopted and what services are required to support researchers? To address these questions, and distinguish between universal and domain-specific needs, we analyzed the life cycle of a typical research project and specifically the role of research software in it.
        Our analysis divides the cycle of a research project into five distinct phases: Starting with the analysis of existing research software, planning for new software developments, actual software development, strategies to make software findable, and, finally, ensuring the software fulfills FAIR principles to allow reuse. Each phase poses unique requirements and challenges that should be addressed while adhering to the FAIR principles.
        As a result, we identified three domain-specific standards and guidelines that could significantly contribute towards making the research software FAIR in its respective community. These incorporate a tailored domain-specific metadata scheme for research software (based on general standards like CodeMeta [2]), structured guidelines for Research Software Engineering (RSE) in the particular domain (e.g., [3]), and application programming interfaces (API) standards for typical APIs manifesting in the domain. From our perspective especially the guidelines are an important instrument to successfully integrating the FAIR principles into the routine exercise of software design and use in research projects.
        Additionally, we highlight seven services that could support research to achieve FAIR research software. These include a domain-specific registry, aiding researchers in cataloging and locating domain-specific research software; software repositories, for the developing and versioning of research software; a service to create and organize software management plans to help plan and track software development [4]; a reproducibility checker to verify the consistent results of the research software; a metadata generation service to automate the creation of comprehensive and standardized metadata; software journals which offer a legitimate publication medium to ensure peer-review for research software; and a FAIRness Evaluator, a service to confirm the adherence of the FAIR principles.
        We want to showcase the different aspects required to support researchers to include the FAIR principles in their daily work when using and developing research software in research projects. By defining useful services, standards, and guidelines we want to contribute to a better understanding of what is needed to get closer to FAIR research software in all research projects.
        In our conference presentation, we would like to present and discuss these identified phases of the research project cycle, especially the envisaged services, standards, and guidelines linked to each step.

        [1] https://doi.org/10.15497/RDA00068
        [2] http://ssi1.eprints-hosting.org/id/eprint/2/
        [3] https://doi.org/10.1371/journal.pcbi.1005265
        [4] https://doi.org/10.37044/osf.io/k8znb

        Speaker: Stephan Alexander Ferenz (Carl von Ossietzky Universität Oldenburg; OFFIS)
      • 09:40
        Research software versus mundane software: The gap between software usage and software mentioning 20m

        For the vast majority of researchers across disciplines, software use is an everyday practice, and data analysis is not the only way of scientific sensemaking with software. The talk presents survey results showing that research pipelines are populated with diverse types of software – among them software tailored for research purposes („research software“) as well as software covering broader tasks which are not specific to science („mundane software“). Laying the results against recent literature about software mentions in science reveals a gap between using software for research and mentioning it in publications: mundane software is widely used in science but goes often unreported.

        The assumption that software use is epistemically relevant is embedded in the discourse on research software engineering. However, speaking from a socio-epistemological perspective, the role of software in general for research as a process has been overlooked so far: Depending on the field of research, the roles of research software versus mundane software in achieving results may vary. For example, while data driven research might depend more on the software used for analysis (e.g. an R package), non-empirical research might depend more on software for literature management (e.g. Citavi). These considerations led to the following research question: Which software do researchers use for which purpose?

        The research question is approached with data from a trend survey on research data infrastructures which was conducted in May 2024 in the context of Base4NFDI (Measure 4.3). The representatively invited survey yielded 1033 answers from German researchers across disciplines and fields. The participants were asked with open text fields about which software and platforms they use for the purposes of a) text production and literature management, b) data collection and analysis, c) knowledge organization and search engines and d) communication and project management. The answers were manually coded into ~ 800 software/platform names in ~3400 software/platform-purpose pairs.

        Besides the gap between software usage and software mentioning, the survey shows that in all but the data parts of the research pipeline, most researchers use proprietary software and platforms. As proprietary software has become more networked in the recent decade, this poses a danger of lock-in and a challenge to FAIRification of the research process in general which cannot be overestimated.

        Speaker: Judith Hartstein (Deutsches Zentrum für Hochschul- und Wissenschaftsforschung)
      • 10:00
        What researchers need vs what they ask for 20m

        Researchers often come to us for RSE help with quite specific technical questions, such as "how can I parallelise this Python code?", or "why does this matlab function use all my RAM?", and it can seem natural to dive straight into directly answering their question.

        Sometimes this is the best approach to help them, but in many cases it is worth asking some more questions about what it is that they are trying to do, and how their code fits into accomplishing that goal.

        It may then become clear that there are alternative solutions that the researcher did not consider, often because they were not aware of their existence, such as using an existing library, or a better suited language.

        In this talk I'll give some real world examples of this, and try to provide some suggestions for how best to help people who come to you with this kind of question.

        Speaker: Liam Roger George Keegan (Scientific Software Center, Heidelberg University)
    • 09:00 10:30
      Research Software Metadata and FAIR Assessment 1h 30m Seminarroom 006 (Building 30.96)

      Seminarroom 006

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      This skill-up series, promoted by the EVERSE project, consists of two 45-minute sessions designed to help the audience understand how to improve their software's sustainability and FAIRness. The sessions combine presentations with demonstrations of key tools.
      Session 1: Software Metadata

      • Introduction to metadata for research software
      • Exploring key metadata standards (Citation File Format and CodeMeta)
      • Demonstration: Creating metadata using CFF Initializer and CodeMeta Generator

      Session 2: FAIR Assessment for Research Software

      • Understanding FAIR principles for software
      • Overview of assessment approaches and tools including:
      • An insight on the the EVERSE dashboard
      • HowFAIRis
      • Fuji
      • Demonstration: Example software FAIRness assessment

      Format

      Two 45-minute sessions combining presentations with tool demonstrations.

      Target Audience

      Researchers and RSEs who want to learn about software metadata standards and FAIR assessment tools.

      Prerequisites

      None. While participants are welcome to bring their own software project (e.g. a GitHub repository), the sessions will focus on demonstrations using prepared examples.

      Expected Outcomes

      • Understanding of software metadata standards and creation tools
      • Familiarity with FAIR assessment approaches and available tools
      Speakers: Elena Breitmoser, Faruk Diblen, Giacomo Peru (University of Edinburgh)
    • 09:00 10:30
      Research Software in Digital Humanities Room 206 (Building 30.70)

      Room 206

      Building 30.70

      Straße am Forum 6, 76131
      Convener: Dr Maria Guadalupe Barrios Sazo (Forschungszentrum Juelich)
      • 09:00
        Research Software Engineering for Musicological Research: Towards Feature-based Versioning for Music Analysis 20m

        Musicological research is challenged with the complexities of analyzing multiple revisions and variants of musical compositions [2]. The need for systematic tools to handle this variability has become increasingly important as musicologists rely more on computational methods for analysis. This talk presents an approach that introduces feature-based versioning known from software engineering to help manage revisions and variants of both music compositions and their analyses. The goal of our ongoing work is to develop software that can provide musicologists with the tools to streamline their research workflows in the domains of music philology and analysis.

        Musicologists use domain-specific languages (DSLs) such as the Music Encoding Initiative (MEI), MusicXML, and LilyPond for encoding scores and performing analyses. However, these tools lack specific support for managing the many revisions and variants of musical works. Current tools do not provide a structured way to track and compare these changes, or to manage the layers of analysis required for both the score and its musicological interpretation. Our research thus applies feature-based versioning used in software product lines [1], to create software that can map music features -- such as harmony, dynamics, and voice-leading patterns -- directly to different versions of a musical composition. This approach will allow musicologists to compose new variants of musical works or their analyses by selecting and combining specific features. Specifically, we aim to integrate version control capabilities using ECCO, a feature-based version control system that can manage fine-grained changes in both musical scores and their associated analytical annotations.

        This Research Software Engineering (RSE) project involves developing and customizing software that is specifically tailored for the needs of musicologists. By encoding musical features into DSLs like the Music Encoding Initiative (MEI), LilyPond, DCML, and combining this with feature-based versioning, we are enabling automatic tracking of revisions and variants [2,3]. Key challenges addressed in this RSE effort include managing the granularity of music features, dealing with scattered and highly interacting features, and modeling complex dependencies between various musical elements. This interdisciplinary research offers a novel application of feature-based techniques in the digital humanities. By creating specialized software tools that manage the complexity of musical works and their variations, this project exemplifies how RSE contributes to enhancing research capabilities in the humanities, supporting the integration of computational techniques into the research practices of musicology.

        [1] Lukas Linsbauer, Felix Schwägerl, Thorsten Berger, Paul Grünbacher: Concepts of variation control systems. J. Syst. Softw. 171: 110796 (2021)
        [2] Paul Grünbacher, Markus Neuwirth: Towards Feature-based Versioning for Musicological Research. VaMoS 2024: 77-82
        [3] Paul Grünbacher, Rudolf Hanl, Lukas Linsbauer; Using Music Features for Managing Revisions and Variants of Musical Scores. Computer Music Journal 2024; doi: 10.1162/comj_a_00691

        Speaker: Paul Grünbacher (Johannes Kepler University Linz)
      • 09:20
        Nachnutzung und bidirektionale Weiterentwicklung von Forschungssoftware 20m

        In den Jahren 2021–2024 wurden im Rahmen des DFG-Projekts „Henze-Digital“ etwa 700 Briefe digital ediert. Um das immense Pensum stemmen zu können, galt von Anfang an die Prämisse, keine eigene Softwarelösung zu schaffen, sondern die Nachnutzung einer bestehenden Forschungssoftware zu erproben. Im Bereich der Musikwissenschaft ein Novum. Durch den Forschungsstandort Detmold/Paderborn, an dem das Projekt angesiedelt war, lag es nahe die Forschungssoftware (WeGA-WebApp) und die Datenschemata (TEI-ODD) der Carl Maria von Weber-Gesamtausgabe (WeGA) zu adaptieren, da diese Infrastruktur seit über zehn Jahren (2011) am selben Standort entwickelt wird und (in der Musikwissenschaft) als State-of-the-Art gilt.
        Da die WeGA-WebApp bereits die Forschungsdaten der WeGA verarbeiten kann, mussten nur die Anpassungen für Henze-Digital implementiert werden – so die Theorie. Das Problem: Die aus der Weiterentwicklung entstehende HenDi-WebApp sowie deren Ursprung (WeGA-WebApp) sollten eine stabile bidirektionale Verbindung aufweisen, während beide Produkte parallel weiterentwickelt wurden. Das Ziel: Die Anpassungen und Weiterentwicklungen, die im Rahmen von Henze-Digital nötig waren, sollten Synergieeffekte zwischen beiden Projekten erzeugen.
        Ein erster Ansatz war den Quellcode der WebApp als git-fork zu modifizieren, um die Abhängigkeit zum Originalcode zu erhalten. Dies erwies sich schnell als untauglich, da tiefergehende Anpassungen in der Infrastruktur der Software eine Synchronisation mit dem Origin aus arbeitsökonomischer Sicht unmöglich machten. Durch die Einbindung des Quellcodes der WeGA-WebApp als git-submodule und einen komplexen build-Prozess (Apache Ant), bei dem große Teile der ursprünglichen build-Skripte verwendet wurden, entstand ein Workflow, durch den das methodische Konzept produktiv eingesetzt werden konnte. Die Integration dieses Prozesses in eine Gitlab-CI vereinfachte die Auslieferung der Software enorm.
        Doch wie wird mit den Anpassungen am Code konkret umgegangen? Schließlich ist ein Bugfix in der HenDi-WebApp nur bedingt sinnvoll, wenn dieser Fehler in der WeGA-WebApp bestehen bliebe. Eine projektübergreifende Abstimmung der Entwickler war unumgänglich auch im Hinblick auf die Rückführung bzw. Implementierung neuer Features.
        Im Vortrag soll der hier stark verkürzt und vereinfacht dargestellte Workflow im Detail vorgestellt werden. Ferner werden Ideen, Probleme und Lösungen kritisch reflektiert und zur Diskussion gestellt.

        Speaker: Dennis Ried
      • 09:40
        Supporting the lifecycle of place-based data 20m

        Places as main access to everyday environments (Cresswell 2004) are no fixed entities, but are in a constant change. Practice theory (Schatzki 2002) describes how places are composed of social and material arrangements influencing how people interact with them and alter them according to their needs. How people read places depends on and number of factors including their current need and their socio-cultural background (Cresswell 2004). Place-based information can thus help in a number of contexts to transcend 2D cartography by intermixing it with multi-modal data (images, audio, video, text) representing different perspectives on space: (1) how do platial storytellings differ according to different biographies? (3) What do place-based media reveal about social discourse, e.g. by the interpretation of urban street art in digital literature and media studies?
        Qualitative GIS (Schuurman 2006, Kwan 2002) and place-based GIS approaches (Purves et al. 2019, Gao et al. 2023, Kremer 2018) involving multi-modal geo-data (text, images, audio, videos) capturing the very moment of making sense of place are well established. Yet, collecting and providing high quality place-based data for specific research questions put an ongoing challenge to the domain of research software development. Which arrangements do visitors actually refer to on site? What photos and audios do they choose to frame their view? How do they feel about those places? Following Critical Data Studies (Dalton/Thatcher 2014), those rich data sets even help to reveal underrepresented perspectives and act as counter-data (Kitchin/Lauriault 2014) contrasting normative or scientific views from nowhere (Jasanoff, 2017).
        Integrating the whole workflow cycle of (1) data collection, (2) data screening and (3) data presentation, I report on first experiences developing a suite of applications addressing those challenges. (1) The mobile app SpaceLog (Kremer et al. 2023) allows for recording multi-modal place-based data on site and thus digitises established workflows of accompanied walks and think-aloud protocols (Degen/Rose 2012). It can act as (a) survey tool observing individual spatial behaviour as well as (b) a research diary. (2) Utilising the category system of SpaceLog as filter, I show how the exploration of rich data sets can be assisted by an integrated dashboard allowing for early explorative data analysis. This resembles the earlier, qualitative comparative analysis of multi-modal place-based data (Psenner 2004). (3) Utilising the app GeoExplorer (Kremer/Wagner 2023, Verstegen/Kremer 2023), I show how place-based data can be used to stage digitally assisted excursions or public trails by (a) presenting place-related media ranging from photo and audio to AR experiences in a web-based mobile application. (b) To involve different stakeholder groups already at the stage of content creation (Glasze/Pütz/Weber 2021), we also provide a web-based input form guiding the users through the process of creating place-related experiences for their respective target groups.
        I report on initial success stories of supporting the lifecycle of working with place-based, individual data from different research disciplines, including health geographies, social studies, educational partners and digital archaeology.

        Speaker: Dominik Kremer (Friedrich-Alexander-Universität Erlangen-Nürnberg)
      • 10:00
        Infrastructures for a community-developed text processing library 20m

        The increasing demand for accessible Natural Language Processing (NLP) tools, as well as the continuing growth of data and the associated processing time, in the Digital Humanities (DH) community has highlighted the need for platforms that lower the barrier to advanced textual analysis across various research fields in the humanities. MONAPipe, short for “Modes of Narration and Attribution Pipeline”, meets this need by offering a modular, open-source NLP pipeline that provides end-to-end integration of community-developed classifiers. MONAPipe was originally created in the project group MONA with a particular focus on Computational Literary Studies (CLS) and is now being further developed within Text+ as part of the German National Research Data Infrastructure (NFDI) for the needs of a broad user group in the humanities.

        MONAPipe is distributed as a Python library based on the NLP framework spaCy. Based on spaCy’s capability to include custom components, MONAPipe integrates its own components and additionally allows them to have several implementations, e.g. the component speech tagger has a neural and a rule-based implementation (see Brunner et al. (2020) and Dönicke et al. (2022)). Designed to make specific community-driven NLP components accessible, MONAPipe provides an intuitive, Python-based framework that fosters data literacy, helping DH researchers develop a deeper understanding of text analysis requiring only a basic knowledge of Python. Additionally, we invite developers to participate by integrating their own components or implementations. For both applications, using and developing MONAPipe, a comprehensive documentation is provided.

        MONAPipe incorporates larger resources (e.g. custom models) from an external repository. The software leverages GRO.data, a long-term archive based on dataverse that provides versioning and persistent identifiers. Developers are free to use other common data repositories such as Hugging Face Hub.

        MONAPipe uses a containerisation strategy for managing the highly specific requirements of NLP components. NLP tools often have strict, conflicting library requirements, and dependency issues can disrupt workflows. Currently, we encapsule specific implementations within Docker containers to isolate dependencies in self-contained environments, preventing compatibility conflicts and ensuring stable, reproducible operations. For users, these containers run locally, provide their results via REST API interfaces and integrate them into MONAPipe.

        In addition to local container usage, MONAPipe will offer online APIs through KISSKI Services. These APIs, running on the HPC cluster at the Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG), will provide scalable, high-performance access to MONAPipe’s components and implementations, enabling users to leverage powerful computational resources without managing containers locally. This combination of local and HPC-based access offers flexibility and ease of use for diverse research needs.

        Speakers: Florian Barth, Tillmann Dönicke, Mathias Göbel
    • 09:00 10:30
      Risk of technical debt, lock-ins, reliance on non-free software, hidden costs 1h 30m Seminarroom 104 (Building 30.96)

      Seminarroom 104

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      Which software tools do we rely on?
      Which tools do we take for granted?
      What are their properties?
      Are they legacy? Obsolete? Free? Freeware? Proprietary?
      Do we contribute to them? Is their cost justified?
      Are they maintainable, extensible, repairable?
      What is astroturfing? What is a monopoly and how does affect us?
      Is open-source better than freeware ? And than free?
      What's the difference between open-source software and free software?
      What common goals do they pursue? How their different philosophy has split the community?
      Are there freedom-limiting costs?
      What is customer lock-in?
      Were you affected by Anaconda's change of terms of service?
      Do you use GitHub? Do you rely on GitHub? Do you use VS.code? Do you know Microsoft Recall?

      In this workshop we first get introduced to four topics: free and libre/open source software (FLOSS); de-facto monopolies and lock-in; the tragedy of the (digital) commons.

      After a motivating 10-minutes introduction, we have four phases
      1. FLOSS
      2. Lock-in
      3. Ways out
      4. Giving back

      Each phase takes 15 minutes and consists of a short intro and a pairwise activity.
      Pairs will change every 15 minutes and will be formed in a way to maximize exchange.
      In the pairwise activity we exchange experience and summarize the learnings.

      Bring your questions, answers, open questions.

      The workshop ends with a 20-minute round table.

      No computer is required.
      A computer with a browser and a keyboard can be optionally used instead of paper and pen.

      The format of this event has been (partially) tried out and been appreciated by some 20 participants just two weeks ago, so we hope it can be appreciated here, too.

      Speakers: Michele Martone (Leibniz Supercomputing Centre), Tomas Stary (Karlsruhe Institute of Technology)
    • 09:00 10:30
      SE Static Analysis Audimax B

      Audimax B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe, Germany
      Convener: Ben Hermann ((TU Dortmund))
      • 09:00
        On the Anatomy of Real-World R Code for Static Analysis 22m
        Speakers: Florian Sihler (Ulm University), Lukas Pietzschmann (Ulm University), Raphael Straub (Ulm University), Matthias Tichy (Ulm University), Andor Diera (Ulm University), Abdelhalim Dahou (GESIS - Institute for the Social Sciences)
      • 09:22
        Total Recall? How Good Are Static Call Graphs Really? 22m
        Speakers: Dominik Helm (Universität Duisburg-Essen, Technische Universität Darmstadt, National Research Center for Applied Cybersecurity ATHENE), Sven Keidel (Technische Universität Darmstadt), Anemone Kampkötter (Technische Universität Dortmund), Johannes Düsing (Technische Universität Dortmund), Tobias Roth (Technische Universität Darmstadt, National Research Center for Applied Cybersecurity ATHENE), Ben Hermann (TU Dortmund), Mira Mezini (Technische Universität Darmstadt, hessian.AI, National Research Center for Applied Cybersecurity ATHENE)
      • 09:45
        AXA: Cross-Language Analysis through Integration of Single-Language Analyses 22m
        Speakers: Tobias Roth (Technische Universität Darmstadt, National Research Center for Applied Cybersecurity ATHENE), Julius Näumann (Technische Universität Darmstadt, National Research Center for Applied Cybersecurity ATHENE), Dominik Helm (Universität Duisburg-Essen, Technische Universität Darmstadt, National Research Center for Applied Cybersecurity ATHENE), Sven Keidel (Technische Universität Darmstadt), Mira Mezini (Technische Universität Darmstadt, hessian.AI, National Research Center for Applied Cybersecurity ATHENE)
      • 10:07
        JPlag: Detecting Obfuscated Software Plagiarism using Token Normalization Graphs 22m
        Speakers: Larissa Schmid (Karlsruhe Institute of Technology (KIT)), Sebastian Hahner (Karlsruhe Institute of Technology (KIT)), Timur Sağlam (Karlsruhe Institute of Technology (KIT))
    • 09:00 10:30
      SE Vorträge Finalrunde SWT-Preis Vortragsraum 3 ( Building 30.51 (Bibliothek))

      Vortragsraum 3

      Building 30.51 (Bibliothek)

      Straße am Forum 1, 76131 Karlsruhe, Germany
      • 09:00
        Synchronous Stream Runtime Verification with Uncertainties and Assumptions 30m Vortragsraum 3 (Building 30.51 (Bibliothek))

        Vortragsraum 3

        Building 30.51 (Bibliothek)

        Straße am Forum 1, 76131 Karlsruhe, Germany
        Speaker: Hannes Kallwies (Univerität Lübeck)
      • 09:30
        A Dynamic Service-oriented Software Architecture for the Automotive Domain 30m Vortragsraum 3

        Vortragsraum 3

        Building 30.51 (Bibliothek)

        Straße am Forum 1, 76131 Karlsruhe, Germany
        Speaker: Alexandru Kampmann (RWTH Aachen)
      • 10:00
        Facilitating Control Software Engineering with Executable Behavior Models 30m Vortragsraum 3 (Building 30.51 (Bibliothek))

        Vortragsraum 3

        Building 30.51 (Bibliothek)

        Straße am Forum 1, 76131 Karlsruhe, Germany
        Speaker: Bianca Wiesmayr (Johannes-Kepler-Universität Linz)
    • 09:00 12:30
      SE Co-Located: Treffen Arbeitskreis MABSS Seminarraum 18 (Building 30.48 (MZE))

      Seminarraum 18

      Building 30.48 (MZE)

    • 10:30 11:00
      Coffee Break 30m Audimax Foyer

      Audimax Foyer

      Building 30.95

    • 11:00 12:30
      Hands-on semantic data management with LinkAhead: Increased data findability and reusability 1h 30m Seminarroom 104 (Building 30.96)

      Seminarroom 104

      Building 30.96

      Straße am Forum 5, 76131 Karlsruhe

      In this hands-on workshop, we introduce the open source software LinkAhead, which promotes agility in semantic data management: LinkAhead is a semantic research data management system, facilitating enhanced data findability and reusability through data embedding into context. Its flexible data model (the data structure can be changed without migration of existing data) allows to leverage existing standard ontologies, promoting transparency, interoperability and collaboration across diverse research domains.

      Data management is essential for the storing, searching, retrieving and analyzing of data sets along with their contextual connections, ensuring their usability for current and future users. Effective data management not only ensures the reuse of valuable data by current and future users but also enhances its discoverability ("Where can I find the training data for sensor X from setup Y?") and utility through contextual embedding ("What were the experimental settings for data collection in project P, and what were the associated challenges?"). Thus data management solutions like LinkAhead support preparing FAIR open data, enable data collaboration and knowledge exchange.

      This workshop consists of a short live demonstration of the LinkAhead Python client, and participants are encouraged to follow along on their own machines. A Jupyter notebook will be made available online before the session.

      Workshop participants (researchers, research software engineers) will learn these LinkAhead skills:

      • Understand, create and edit data models
      • Semantic queries, also for linked data sets
      • Add and retrieve data

      Recommended prerequisites for participation:

      Basic Python knowledge

      Jupyter Notebook with these Python libraries:

      pip install linkahead caosadvancedtools
      

      Test beforehand with:

      import linkahead
      
      linkahead.configure_connection(
          url="https://demo.indiscale.com",
          username="admin", password="caosdb",
          password_method="plain")
      
      linkahead.Info()
      

      This should output a short info string like this:
      "Connection to LinkAhead with 84 Records."

      Speakers: Dr Alexander Schlemmer (IndiScale GmbH, Göttingen), Dr Florian Spreckelsen (IndiScale GmbH, Göttingen)
    • 11:00 12:30
      Legacy Research Software SR A+B

      SR A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Julian Gethmann (KIT-IBPT)
      • 11:00
        Conserving Legacy Code: From handwritten Makefile to modern build system and activatable archivation 20m

        With the retirement of a colleague we were handed the Fortran source code for a computational software.

        At that stage the software was feature complete, offering a large variety of options for simulation of semiconductors. Along with the implementation of advances physical models for semiconductors and optoelectronic devices,
        key features at the time of development where a custom scripting language to steer simulations and built-in plotting capabilities through X11.
        While not having any tests, it was validated against various real world experiments and is therefore trusted by many scientists both at our institute and by collaborators.

        Recently, there was renewed interest in researching related and extended problems not supported by the software. Due to the infeasibility of extending the old codebase, the plan is to replace it with a new implementation in Julia based on our package ecosystem WIAS-PDELib.
        However, due to the high trust in its results, an executable running on modern operating systems was needed to perform validation of the new codebase against in silico experiments.

        In this talk we want to describe the steps we took to modernize the build system such that the code can be built from source on modern computers. In addition, we discuss the concept of archiving this type of legacy code such that it can be activated upon need.

        Speaker: Jan Philipp Thiele (Weierstrass Institute Berlin)
      • 11:20
        Recovering Knowledge from old Code 20m

        Imagine: A 30 year old Fortran code. 10K lines of three-letter variables, almost no commentary of varying correctness and no one left to remember how it works. Amazingly it is still in use - even though it is unclear how exactly it calculates what it calculates…

        Somewhere buried in these dusty bits and bytes supposedly lies an algorithm that promises to be better than the tools that a research group have available, faster and more precise.

        The HIFIS RSE-consulting team was approached to help with investigating this software, unlocking its hidden secrets and coming up with a way to deal with this kind of "inherited software", because we can be sure: There is a lot more where that came from.

        In this talk we will present how we approach this problem, the plan, the steps already taken, the challenges encountered, what worked (or at least looks promising) and what didn't.

        Speaker: Fredo Erxleben (Helmholtz-Zentrum Dresden-Rossendorf)
      • 11:40
        10 years of rio and readODS: Maintaining an I/O infrastructure of R 20m

        In this proposed talk, I will talk about my experience in maintaining the "boring", but arguably important, part of the R infrastructure: Input and Output (I/O) infrastructure. The foci will be two packages I am currently maintaining and recently have their respective tenth anniversary: rio and readODS. In this proposed talk, I will briefly talk about how the (chaotic) I/O infrastructure of R looked like ten years ago. Then, I will talk about how the package rio simplifies I/O tasks with only two functions: import() and export(). I will also talk about the package readODS, which is designed as a silent family member of rio for reading and writing OpenDocument Spreadsheets (ODS), a truly open format that has been adopted by various government agencies such as NATO and EU. Then, I will talk about what has been changed in the last ten years by rio and readODS. For example, readODS has a performance gain of over 1000x and is the significantly faster and usable ODS reading and writing option than the offerings for Python, Julia, and Javascript.

        Speaker: Chung-hong Chan (GESIS)
      • 12:00
        Developing a modern build system for the earth system modelling framework MESSy 20m

        The earth system modelling framework MESSy (Modular Earth Submodel System: https://messy-interface.org/) consists of around 3.5 Mio. lines of pure code, most of it written in Fortran, and is mainly used on large HPC clusters. Here, users as well as developers usually have to configure and build the software package on their own with the help of a build system which is an essential part of research software. MESSy has been developed for 20 years and offers a lot of different configuration options and packages. All those configurations need to be represented within the internal build system. The long grown currently used version based on autoconf is rather complex and elaborately to maintain, e.g. when it should be extended to new computation architectures.
        As part of a supporting project, we therefore created a new and modern build system for this software within the past year. The main focus was to recreate the existing system and its functionalities with CMake, one of the most common tools, while improving the flexibility, developers’ usability and integrability to various architectures. Moreover, the recompilation time shrinks when the code is changed which simplifies development. Right now, the new CMake system is used more and more by MESSy’s developers in contrast to the autoconf build.
        In the following, we will describe the process of creating such a CMake system for large grown software based on our experience with MESSy. Further, advantages in general and hurdles related to this specific example will be discussed.

        Speaker: Sven Goldberg (DLR - Institute of Software Technology)
    • 11:00 12:30
      Metrics, Indicators, and Assessment of Research Software Audimax A

      Audimax A

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Mario Frank
      • 11:00
        Multi-Dimensional Categorization of Research Software 20m

        Research software has been categorized in different contexts to serve different goals. We start with a look at what research software is, before we discuss the purpose of research software categories. We propose a multi-dimensional categorization of research software. We present a template for characterizing such categories. As selected dimensions, we present our proposed role-based, readiness-based, developer-based, and dissemination-based categories. Since our work has been inspired by various previous efforts to categorize research software, we discuss them as related works. We characterize all these categories via the previously introduced template, to enable a systematic comparison.

        The categorization has been produced in the context of a task force of the special interest group on Research Software Engineering, within the German Association of Computer Science (GI e.V.) and the German Society for Research Software (de-RSE e.V.).

        We envision the following benefits from using categories for research software, which may serve as a basis of institutional guidelines and checklists for research software development; to better understand the different types of research software and their specific quality requirements; to recommend appropriate software engineering methods for the individual categories; to design appropriate teaching / education programs for the individual categories; to give stakeholders (especially research software engineers and their management) a better understanding of what kind of software they develop; for a better assessment of existing software when deciding to reuse it; for research funding agencies, to define appropriate funding schemes; to define appropriate metadata labels for FAIR research software; in RSE Research to provide a framework for classifying research.

        In the realm of RSE research, we hope that the categorization provides a framework for classifying research objects, supporting software corpus analyses, and enhancing our understanding of the different types of research software and their properties. This structured approach may aid in organizing and interpreting the vast landscape of research software, contributing to advancements in RSE methodologies and practices.

        We report on a systematic mapping study to evaluate our role-based categorization, and the multi-dimensional categorization of selected research software examples.

        Speaker: Wilhelm Hasselbring (Kiel University)
      • 11:20
        How to find and evaluate good research software - a field report 20m

        As part of the Incubator Initiative, the Helmholtz Association has promoted the field of research software engineering. Among other activities, the Helmholtz Research Software Directory (RSD) was developed and the Helmholtz Software Award was launched.
        But these great ideas have raised questions:
        • How exactly do you find the great software?
        • How do you encourage the development teams to publish it and describe it in such a way that not only insiders understand what it's all about?
        • How can international reviewers be recruited and how can they evaluate non-specialist software applications?
        • How do you compare and evaluate software that differs greatly not only in terms of technical aspects, but also in terms of maturity, user community and target group?

        The Helmholtz RSD has developed into a successful model and is increasingly bringing added value to software developers and scientists with hundreds of solutions and often thousands of “harvested” references to those. And the Helmholtz Software Award 2023 was presented in three categories and the applications for the second call for proposals 2024 have already been received and are being reviewed. At the same time, the topic of evaluating research results, including data and software, has recently become increasingly important. Here too, the evaluation of research software is playing an increasingly important role.

        In this presentation, the experiences and results of these processes will be presented in detail. The topics mentioned and still in flux are of growing importance for universities, research institutions and also the NFDI consortia! These experiences in this still relatively new field are therefore valuable information and a basis for discussions in the RSE community!

        Speaker: Dr Uwe Konrad (HZDR)
      • 11:40
        Software as new indicator in research evaluation - first experiences in HGF 20m

        When evaluating research impact, it’s crucial to consider not only traditional publications but also other facets of scientific work. Data and research software are valuable products of research. High-quality software development enables reproducible science, data accessibility, and innovative methodologies, supporting a broader research ecosystem. However, software developers Recognizing software contributions would encourage developers to prioritize functionality, reliability, and usability. On the international level, initiatives like COARA (Coalition for Advancing Research Assessment, https://coara.eu/) or HiddenRef (https://hidden-ref.org/) aim to reshape the research evaluation by recognizing diverse contributions, including software development. Similar discussions within HGF lead to establishing a Taskgroup Indicators. The group developed a step-by-step plan for the introduction of new key performance indicators, which was then adopted by the general assembly of the Helmholtz Association. Beginning with 2022, as a first step a simple indicator for software was monitored. The talk describes some results of this attempt and discusses also problems with the approach.
        (This is part one of our report on our approach to introduce software into research assessment at HGF. The second part describes some new ideas about a more detailed quality indicator for research software and is submitted as #48)

        Speaker: Bernadette Fritzsch
      • 12:00
        A new approach for a qualtity indicator for software 20m

        Research software plays a pivotal role in the Helmholtz Association. So HGF decided to also include software in its research evaluation. A dedicated task group has proposed a new evaluation approach that recognizes software quality through multiple dimensions rather than a single score. This multi-faceted framework considers different factors (like the FAIR4RS criteria), acknowledging the complexity and diverse contributions of research software. By using a maturity model, improvement of the software during its lifecycle can be documented. The presentation will show details of the proposed quality indicator for research software publications and initiate further discussions how to include software in research assessment.
        (This is part two of our report on our approach to introduce software into research assessment at HGF. The first part describes our experiences with the first attempt of a simple "entry indicator" for research software and is submitted as #47)

        Speaker: Guido Juckeland (Helmholtz-Zentrum Dresden-Rossendorf)
    • 11:00 12:30
      SE Formal Methods Audimax B

      Audimax B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe, Germany
      Convener: Reiner Hähnle
      • 11:00
        Model-Based Testing of Quantum Computations 22m
        Speakers: Malte Lochau (University of Siegen), Ina Schaefer (Karlsruhe Institute of Technology)
      • 11:22
        Kind Controllers and Fast Heuristics for Non-Well-Separated GR(1) Specifications 22m
        Speakers: Ariel Gorenstein (Tel Aviv University), Shahar Maoz (Tel Aviv University), Jan Oliver Ringert (Bauhaus University Weimar)
      • 11:45
        Cost-Sensitive Precomputation of Real-Time-Aware Reconfiguration Strategies based on Stochastic Priced Timed Games 22m
        Speakers: Hendrik Göttmann (TU Darmstadt), Birte Caesar (Helmut Schmidt University Hamburg), Lasse Beers (Helmut Schmidt University Hamburg), Malte Lochau (University of Siegen), Andy Schürr (TU Darmstadt), Alexander Fay (Ruhr University Bochum)
      • 12:07
        Formal Synthesis of Uncertainty Reduction Controller 22m
        Speakers: Marc Carwehl, Calum Imrie, Thomas Vogel, Genaina Rodrigues, Radu Calinescu, Lars Grunske
    • 11:00 12:30
      SE Monitoring and Education Vortragsraum 3. OG (Building 30.51 (Bibliothek))

      Vortragsraum 3. OG

      Building 30.51 (Bibliothek)

      Convener: Marie Platenius-Mohr
      • 11:00
        Detecting Usage of Deprecated Web APIs via Tracing 22m
        Speakers: André van Hoorn (University of Hamburg), Leif Bonorden (University of Hamburg)
      • 11:22
        Cause-Effect Chain-Based Diagnosis of Automotive On-Board Energy Systems 22m
        Speakers: Stefan Kugele (Technische Hochschule Ingolstadt), Lorenz Schreyer (BMW Group), Martin Lamprecht (BMW Group)
      • 11:45
        UMLsecRT: Reactive Security Monitoring of Java Applications With Round-Trip Engineering 22m
        Speakers: Sven Peldszus (Ruhr University Bochum), Jens Bürger (Conciso GmbH), Jan Jürjens (Fraunhofer Institute for Software & Systems Engineering ISST and University of Koblenz)
      • 12:07
        Unveiling Hurdles in Software Engineering Education: The Role of Learning Management Systems 22m
        Speakers: Niklas Meißner (Institute of Software Engineering, University of Stuttgart), Nadine Nicole Koch (University of Stuttgart), Sandro Speth (Institute of Software Engineering, University of Stuttgart), Uwe Breitenbücher (Reutlingen University), Steffen Becker (University of Stuttgart)
    • 11:00 12:30
      Visualization with Research Software Room 206 (Building 30.70)

      Room 206

      Building 30.70

      Straße am Forum 6, 76131
      Convener: Dr Maria Guadalupe Barrios Sazo (Forschungszentrum Juelich)
      • 11:00
        VITESS: Advancing Neutron Instrument Simulation and Virtual Experiments 20m

        Research software plays a crucial role in advancing science by enabling accurate modeling and simulation of experiments. One example is VITESS, a software tool that simulates how neutrons behave in scientific instruments. These simulations help researchers design and optimize experiments in fields ranging from materials science to energy research.

        Since its creation in 1999, VITESS has been used to simulate instruments at research facilities worldwide. The latest version, VITESS 3.6, introduces significant improvements that make the software easier to use, more reliable, and capable of addressing modern research challenges.

        Key updates include:

        • Improved Visualization: Enhanced graphics make it easier to see and understand simulation results.
        • New Features: Tools have been added to support a wider range of experiments, such as tracking neutron pathways and creating realistic models of experimental setups.
        • Real-Time Feedback: Users can now monitor simulations as they progress, allowing adjustments to be made without waiting for the entire process to finish.

        These advancements help make VITESS a versatile and user-friendly tool for researchers. Future versions will build on these features, offering new ways to collaborate and integrate with other scientific software.

        This presentation will introduce the audience to VITESS and its applications, demonstrating how it supports the development of cutting-edge research tools. By showcasing practical examples, such as designing instruments for next-generation neutron research facilities, we aim to inspire new connections between research software engineering and the neutron science community.

        Speaker: Nicolo Violini
      • 11:20
        Jupyter Python Minion: Simplifying SPARQL Queries and Visualisations for Archaeological Data 20m

        The Semantic Web is a treasure trove of interconnected knowledge graphs, providing access to datasets that are invaluable for research in cultural heritage and archaeology. Resources such as triplestores (e.g., the NFDI4Objects Knowledge Graph), Wikibase instances (e.g., Wikidata and FactGrid), and Solid Pods housing geoscientific data open new avenues for interdisciplinary exploration. However, researchers face significant challenges in utilising these resources effectively. Writing SPARQL queries requires a steep learning curve, and the data often returns in formats like sparql-results+xml or sparql-results+json, which are not user-friendly for immediate analysis or visualisation.

        Python has become a critical tool in addressing these challenges. As a versatile scripting language, Python enables researchers to automate workflows, ensure reproducibility, and integrate datasets seamlessly. However, many researchers lack the technical skills or frameworks needed to implement Python solutions in their work. This is where Jupyter Notebooks provide a critical advantage. Combining an intuitive, shareable environment with the computational power of Python, Jupyter Notebooks make it easy to share not just results but the entire workflow. This transparency enhances reproducibility, facilitates collaboration, and aligns with FAIR principles.

        The Jupyter Python Minion builds on this framework, offering an open-source solution to simplify SPARQL querying and data visualisation. By integrating widely used Python libraries such as pyplot, wordcloud, geopandas, and contextily, it transforms raw Linked Open Data (LOD) into actionable insights. Researchers can produce bar charts, pie charts, maps, and word clouds with minimal effort, bridging the gap between technical expertise and domain-specific knowledge. Importantly, the tool enables researchers to document their computational workflows within Jupyter Notebooks, creating reusable resources for the broader research community.

        The need for such tools in archaeological research software engineering is pressing. Computational archaeology increasingly relies on integrating diverse datasets—geospatial, semantic, and cultural—but the technical barriers often hinder broader adoption. By lowering these barriers, the Jupyter Python Minion empowers researchers to harness the power of Python scripts for reproducible and shareable analysis.

        In this talk, I will demonstrate the utility of the Jupyter Python Minion through five use cases:
        1. Playful exploration of Pokémon properties, demonstrating SPARQL queries and visualisation techniques like scatterplots and bar charts.
        2. Mapping the distribution of Samian ware kiln sites, showcasing the regional breakdown of archaeological production centres.
        3. Exploring Irish Holy Wells, revealing etymological patterns through word clouds and pie charts.
        4. Geospatial analysis of Irish Ogham Stones, including density maps and OpenStreetMap-based visualisations to highlight regional clusters.
        5. Integration of geoscientific findspots from Solid Pods, using SPARQL queries to categorise archaeological and geological locations affected by the Campanian Ignimbrite eruption.

        These examples highlight the transformative potential of integrating Python scripting with Jupyter Notebooks for reproducible research. The tool’s shareability fosters collaboration across disciplines, from archaeology to geosciences, and promotes a culture of openness and accessibility in research software engineering.

        This talk will contribute to key themes in RSE, including computational workflows, open-source tools, and software usability, while providing attendees with actionable insights to adopt and adapt these methods in their own research.

        Speakers: Florian Thiery (Research Squirrel Engineers Network), Lutz Krister Schubert (University of Cologne)
      • 11:40
        Multimodal Imaging in Neuropsychiatric Disorders dataset (MINDset) 20m

        Precision psychiatry faces significant challenges, including limited sample sizes and the generalizability of findings, variability in clinical phenotyping, and the need for robust biomarkers to guide personalized treatment approaches. Additionally, the integration of diverse data sources—such as multi-omics, electrophysiology, neuroimaging, clinical records, and cognitive-behavioural data—adds complexity to research efforts. To address these challenges, robust methodologies for data management, quality control, and interoperability are critical. In line with this, the Multimodal Imaging in Clinical Neuroscience research group has collected a comprehensive range of data from diverse studies, encompassing multimodal imaging (fMRI, EEG, fNIRS, PET), omics (microbiome, epigenetics), and cognitive and neuropsychiatric test scores from both psychiatric patients and healthy individuals. However, with approximately 800 datasets scattered across multiple storage systems, the lack of standardized and centralized data and storage poses a barrier to cross-modality analyses and large-scale studies. A unified approach to data management is essential to overcome these obstacles and advance precision psychiatry.

        The Multimodal Imaging in Neuropsychiatric Disorders dataset (MINDset) project addresses significant challenges in the integration and analysis of complex datasets within neuropsychiatric research by developing a comprehensive database housing multimodal neuroimaging and electrophysiology data with metadata, and pre-processed features, organized in accordance with the Brain Imaging Data Structure (BIDS) (Gorgolewski et al. 2016) format. Additionally, a user-friendly query tool to facilitate data access is also being developed. The MINDset web server and database are hosted on a secure, password-protected cloud environment managed by the IT services of FZ Jülich, utilizing the Kubernetes architecture (Burns et al. 2016). This setup leverages Kubernetes container orchestration capabilities to ensure efficient management, scaling, and high availability of the service.

        The metadata file containing demographics, cognitive, neuropsychological and psychopathological assessment results originally in Microsoft Excel format were incorporated into JSON file structure using Python scripts (Python Software Foundation 2024). Each row is parsed into values corresponding to keys from the first row, creating a dictionary, which is finally written into a JSON file. The nested JSON format preserves the hierarchy of the structured data, making it easy to access and ensuring accurate data representation.

        The BEAVERDAM tool (More H and Denker M 2022) was used to build the database from the created JSON files. This tool facilitates the integration of multiple metadata files, structured in a JSON-like format, into a unified, browsable database repository. The BEAVERDAM tool utilizes MongoDB (Bradshaw et al. 2019) for storage, enabling more data scalability and flexibility, as well as supporting flexible deployment in cloud-based environments. The query tool enables users to apply filters across a range of parameters, dynamically updating various visualization charts, statistical data, and tabular representations.
        Centralizing and standardizing our data via MINDset will greatly boost research capabilities in precision psychiatry, unlocking insights at the intersection of multiple modalities. Looking ahead, this unified approach to managing and integrating multimodal data can serve as a model for other researchers and facilities, fostering broader adoption and enhancing collaborative efforts across the field.

        Acknowledgements: We thank Heather More and Volker Hofmann for their invaluable support on the project.

        Speaker: Dr Ravichandran Rajkumar (Institute of Neuroscience and Medicine 4, INM-4, Forschungszentrum Jülich, Germany)
      • 12:00
        Met.3D: Rapid exploration of gridded atmospheric data with interactive 3-D visualization 20m

        Visualization is an important and ubiquitous tool in the daily work of weather forecasters and atmospheric researchers to analyse data from simulations and observations. The domain-specific meteorological research software Met.3D (documentation including installation instructions available at https://met3d.readthedocs.org) is an open-source effort to make interactive, 3-D, feature-based, and ensemble visualization techniques accessible to the meteorological community. Since the public release of version 1.0 in 2015, Met.3D has been used in multiple visualization research projects targeted at atmospheric science applications, and has evolved into a feature-rich visual analysis tool facilitating rapid exploration of gridded atmospheric data. The software is based on the concept of “building a bridge” between “traditional” 2-D visual analysis techniques and interactive 3-D techniques powered by modern graphics hardware. It allows users to analyse data using combinations of feature-based displays (e.g., atmospheric fronts and jet streams), “traditional” 2-D maps and cross-sections, meteorological diagrams, ensemble displays, and 3-D visualization including direct volume rendering, isosurfaces and trajectories, all combined in an interactive 3-D context. In the past year, we have been able to significantly advance the Met.3D code base (available at https://gitlab.com/wxmetvis/met.3d) to make the tool more stable, usable, and to integrate visualization techniques not commonly available in other visualization tools. In this presentation, we introduce our software to the RSE community, show some examples, and discuss challenges of developing the software in an atmospheric science research environment.

        Speaker: Marc Rautenhaus (Visual Data Analysis Group, Hub of Computing and Data Science, Universität Hamburg)
    • 11:00 12:30
      🚀Time to launch: Making the “Standard Template for the Efficient Development of Research Software” accessible 1h 30m Seminarroom 006 (Building 30.96)

      Seminarroom 006

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      As research software is becoming increasingly fundamental in almost all scientific domains, its development and maintenance is of significant importance for scientists. Currently, scientists often lack the profound knowledge and tools to develop and maintain this software throughout its often long lasting life cycle. To promote high-quality software, adequate support, and appropriate recognition of research software engineering (RSE) as academic achievement, the deRSE e. V. and GI e. V. developed a template guideline as a basis for discussion and adoption. This template serves as a foundation for adopting best RSE practices by German universities and research institutions, and providing adequate support to researchers and research software engineers. At the time of writing the template is in the approval process. The goals of the guidelines are (1) ensuring support and investments in RSE in Germany, and thus improving research software in several aspects; (2) providing a foundation for RSE recognition and RSE contributions as academic achievements.
      It is now time for universities and research institutions to adopt and adapt these variant-rich guidelines, and for the community to put them into practice. In this session, we strive for strategies on (A) how to incentivize the adoption at the institutional level, and (B) how to make the guidelines accessible to all involved stakeholders, including researchers and research software engineers developing the research software.
      The BoF session will start with an introduction to these guidelines and other similar efforts. We will proceed with group work in breakout sessions on incentives (A) and accessibility (B). The goal is to develop two dissemination roadmaps, (A) top-down (at the institutional level) and (B) bottom-up (at the work level). The session will also discuss tooling like an interactive web page for adapting the variant-rich template to institution-specific needs (A); a visual representation of specific chapters in terms of decision trees (A/B); a carpentries-style short introduction to the guidelines (B); or a tool that provides feedback if your research software adheres to the guidelines (B). As such, the BoF session is meant to gather input and supporters for putting the guidelines into practice in a collaborative effort. Feedback mechanisms on the guidelines after they have been incorporated at research institutions will be discussed at the end of the session.

      Speakers: Bernadette Fritzsch, Ms Carina Haupt (German Aerospace Center (DLR)), Sebastian Nielebock (Otto-von-Guericke-University Magdeburg, Germany), Bernhard Rumpe (RWTH Aachen), Alexander Struck (Cluster of Excellence Matters of Activity), Inga Ulusoy (University of Heidelberg)
    • 11:00 12:30
      SE Co-Located: FB Softwaretechnik, Steuerkreis SE Seminarraum 17 (Building 30.48 (MZE))

      Seminarraum 17

      Building 30.48 (MZE)

      Convener: Prof. Kurt Schneider
    • 12:30 14:00
      Break 1h 30m
    • 12:30 13:00
      deRSE Farewell and Outlook into 2025/2026 30m Audimax A

      Audimax A

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe, Germany
      Speaker: René Caspart (Karlsruhe Institute of Technology (KIT))
    • 13:30 16:30
      How to improve the visibility and added value of RSE(s) in NFDI 3h Audimax A

      Audimax A

      Building 30.95

      Research Software Engineering is fundamental to the German National Research Data Infrastructure (NFDI). Following that, a "deRSE Arbeitskreis NFDI" serves as a connection point for RSEs in the NFDI inside deRSE e.V. Within the NFDI e.V., several "sections" are dealing with overarching topics, e.g., the "Sektion Common Infrastructures" with its working groups on Data Integration (DI), Electronic Lab Notebooks (ELN) or Research Software Engineering (RSE).

      Within this session, we as the WG RSE in the Section Common Infrastructures, will give an impulsive talk on research data and software management in domain-specific NFDI consortia, base4NFDI, and other sections.

      Following that, we invite NFDI-involved RSEs to join a “Stand-Up-Science” part to briefly present their RSE challenges and solutions in spontaneous or not-so-spontaneous lightning talks (max 7-10 minutes) or short insights into research software such as code, architecture, etc.

      Questions to answer could be:

      1. Name / last academic degree / Affiliation / current task
      2. What are your connections to the NFDI? (existing or possible)
      3. What was the last thing I coded/worked on
      4. Which research data did I last reuse?
      5. Which research software did I last reuse?
      6. Favourite git command
      7. Give a reason why you were told that the software/data could not be published
      8. What advice (best/good practice) would you give to a new RSE?

      If you want to present something, please use the survey at https://umfragenup.uni-potsdam.de/nfdi-rse/.

      All the aspects mentioned will be part of a joint paper on the “RSE in NFDI” topic.

      Speakers: Florian Thiery (CAA e.V.), Bernd Flemisch (University of Stuttgart), Jan Bernoth (Universität Potsdam), Corinna Seiwerth (Friedrich-Alexander-Universität Erlangen-Nürnberg), Jan Linxweiler (Technische Universität Braunschweig), Linnaea Söhn (Academy of Sciences and Literature | Mainz)
    • 13:30 16:30
      Performance portability and high-performance computing with Julia 3h Seminarroom 104 (Building 30.96)

      Seminarroom 104

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      Julia is a friendly, fast and flexible programming language for scientific (and beyond) computing. In this tutorial, we will introduce Julia for high-performance computing with a focus on performance portability.

      Using various examples, we will show how to write parallel programs for GPUs, shared-memory and distributed parallelism. We will use both a high-level array-based programming and a low-level kernel-based programming style. Our goal is to highlight the productivity of Julia, while also providing tools for experts to maximize performance.

      As a highlight, the tutorial will include a live demonstration of Julia running on Graphcore IPU processor.

      Speakers: Dr Mosè Giordano (University College London), Valentin Churavy (Johannes-Gutenberg Universität Mainz & Universität Augsburg)
    • 13:30 15:30
      Research Software Discovery: How do we Want to Search Research Software and Where do we Want to Find it? 2h SR A+B

      SR A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe

      How researchers discover new software, which systems they use and how these systems must be designed to improve the process of software discovery - these are driving questions in the area of software discovery.

      There is a wide range of options for software discovery, such as code and publication repositories, domain, geographic or institution specific catalogs, classical search engines, curated lists, knowledge graphs, social networks of colleagues or friends, and all of these in various combinations, with and without the use of artificial intelligence. This sheer abundance of options leads to the central question: How must a discovery system be designed to enable researchers to find research software that meets their needs?

      In this proposed interactive workshop, we will engage participants in a comprehensive discussion on the current landscape of research software discovery and how it can be optimized to better serve the research community. Our goal is to identify the key challenges faced by researchers and collaboratively propose actionable improvements. The workshop will employ a World Café format, facilitating dynamic and focused discussions across 2-4 tables, each dedicated to specific topics within the area of research software discovery. Potential discussion topics include:

      • Effectiveness of Current Discovery Systems: Evaluating the strengths and weaknesses of existing platforms and tools.
      • User Experience and Accessibility: Identifying barriers and ways to make discovery systems more user-friendly.
      • Integration of AI in Software Discovery: Exploring the potential and limitations of artificial intelligence in enhancing discovery processes.
      • Collaborative Networks and Social Platforms: Leveraging social networks and professional communities for software discovery.
      • Envisioning the Future: How would you like to discover relevant software and evaluate its trustworthiness and usefulness?

      The foundation of this interactive workshop is a presentation on the current state of research software discovery, which has been submitted in parallel to this proposal to the 5th Conference for Research Software Engineering 2025 (deRSE25). This presentation will provide a comprehensive overview and serve as a basis for our discussions. Participants will have access to the slides or the finished preprint to facilitate informed and productive conversations.

      By the end of the workshop, we aim to have a set of concrete recommendations and strategies that can be implemented to improve research software discovery. These insights will be invaluable for developers, researchers, and institutions looking to enhance their discovery infrastructure (e.g. catalogs, knowledge graphs), software discovery processes, and ultimately, their research productivity.

      Duration: 3h

      • 20 (20) Intro
      • 10 (30) Grouping
      • 60 (90) Table Sessions (4x15)
      • 30 (120) Break (Time for table chairs to sum up the discussions)
      • 50 (170) Presentation plus discussion for each table/topic (10/12 min each)
      • 10 (180) Outro (what are WE going to do with the results)
      Speaker: Dr Oliver Karras (TIB – Leibniz Information Centre for Science and Technology and University Library)
    • 13:30 16:30
      SustainKieker Hackathon: Reverse Engineering of Research Software 3h Room 206 (Building 30.70)

      Room 206

      Building 30.70

      Straße am Forum 6, 76131 Karlsruhe

      SustainKieker is a software sustainability research project that aims to improve the reusability and maintainability of research software. Our project employs the Kieker Observability Framework, which started in 2006, to monitor and analyze software systems. The Kieker framework provides monitoring, analysis, and visualization support for performance evaluation and reverse engineering of the software architecture of existing software systems. We released preliminary support for Python with Kieker in 2022, and we achieved another milestone in September 2024, the 2.0.0 release of Kieker. We are furthering the effort on Kieker for OpenTelemetry (OTel) interoperability using Kieker's Instrumentation Record Language (IRL). Kieker IRL allows its users to define new record types, and we extend its feature to define a new OpenTelemetry Export, which translates OTel traces into Kieker records.

      We propose a hackathon event for scientists interested in monitoring and analyzing a research software written in Python. Participants will have the opportunity to utilize both Kieker and OpenTelemetry to understand this research software. The hackathon consists of two hands-on practices that require a personal laptop with an Internet connection. First, we ask participants to instrument an example Python application with OpenTelemetry. The software's execution will send all traces to a Kieker analysis pipeline. The Kieker analysis results allow for reverse engineering and program comprehension of the provided research software, which will (probably) be unknown to the participants, before the hackathon. Afterwards, they should have an overview of the research software's architecture.

      It is a guided practice, and we will provide materials to obviate unnecessary barriers. Second, we will provide another real-world Python research software with increased challenges. The question will consist of several sub-questions so that participants will make step-by-step progress. Lastly, we will conduct an online survey as part of SustainKieker action research.

      SustainKieker is funded by the Deutsche Forschungsgemeinschaft (DFG – German Research Foundation), grant no. 528713834.

      Speaker: Shinhyung Yang (Software Engineering Group, Department of Computer Science, Kiel University)
    • 14:00 17:00
      MATLAB Tools for Sustainable Research Software Development 3h Seminarroom 006 (Building 30.96)

      Seminarroom 006

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe

      Sustainability in research software is crucial to ensure others can understand, reproduce, and build upon your work effectively, potentially extending its functionality with new algorithms and its applications to new domains. In this tutorial, you will learn about tools you can leverage in the MATLAB ecosystem to effectively enhance the maintainability and reusability of your research software.

      In this session designed for Research Software Engineers (RSEs) who are already familiar with clean code principles, you will deepen your understanding and practically implement these practices within the MATLAB environment.

      You will explore a range of tools and features that facilitate sustainable software development. Key topics include:
      Code Refactoring Tools: Leveraging built-in functionalities to refactor and enhance code readability.
      Testing Frameworks: Implementing tests using MATLAB testing frameworks to ensure robust, error-free code and using Continuous Integration (CI) for automatic testing.
      MATLAB Project Management: Utilizing project files and dependencies to organize and manage development environments effectively.
      Version Control Integration: Seamlessly integrating MATLAB with popular version control systems like Git for collaborative development and code tracking.

      Who Should Attend: This tutorial is designed for RSEs, researchers, and developers with a foundational understanding of clean code who wish to enhance their MATLAB skills for sustainable software development.

      Speaker: Dr Mihaela Jarema (MathWorks (Academia Group))
    • 14:00 15:30
      SE Keynote: Keynote by Mira Mezini, TU Darmstadt: AI-assisted Programming: From Intelligent Code Completion to Foundation Models - A Twenty-Year Journey Audimax B (Building 30.95 )

      Audimax B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Thomas Thüm
      • 14:00
        AI-assisted Programming: From Intelligent Code Completion to Foundation Models - A Twenty-Year Journey 1h

        Abstract
        From pioneering work on intelligent code completion to large language models, AI has have significant impact on software engineering over the past two decades. This keynote presentation traces the evolution of AI-assisted programming, highlighting advancements and outlining future directions.

        The talk is structured in three parts. First, we’ll journey back to 2000-2010, exploring pioneering applications of machine learning methods to coding tasks, in particular, the groundbreaking work from my lab on intelligent code completion, which was honored with the ACM SIGSOFT Impact Paper Award in 2024, showcasing the software engineering community’s early contributions. The second part examines the current landscape dominated by modern large language models (LLMs) in coding. While primarily driven by the ML community, these tools are being rapidly adapted by the software engineers for various tasks. This part of the talk will highlight the pressing need for our community to actively engage in designing more reliable and specialized foundation models for software engineering tasks. Subsequently, I’ll present some ongoing work from our lab focused on developing robust foundation models for coding with the specific needs of software engineering in mind. This retrospective not only celebrates past achievements but also critically examines the present landscape, emphasizing the vital role of software engineering expertise in shaping the future of AI-assisted programming.

        Bio
        Mira Mezini is a Professor of Computer Science at TU Darmstadt, where she leads the Software Technology Lab. She serves as TUDa’s representative on the board of the National Research Center for Applied Cybersecurity ATHENE and co-directs hessian.AI, the Hessian Center for Artificial Intelligence. Mezini has held several roles in research funding and governing bodies. She has been elected in the Computer Science Panel of the German Research Foundation (DFG), appointed on the Computer Science Consolidator Grant Panel of the European Research Council (ERC), and elected in the Executive Committee of ACM SIGPLAN. Currently, she is a member of the ERC Scientific Council’s selection committee and the DFG Senate. Mezini’s research focuses on three main areas: programming systems for reliable distributed software and AI, automated software analysis, and foundational code models. With over 200 frequently cited peer-reviewed publications in top venues in software engineering and programming languages, her work has gained significant recognition. She has served or is serving as program chair for major in software engineering and programming languages conferences, including ECOOP, OOPSLA, FSE, and ICSE. Her awards include two IBM Eclipse Innovation Awards (2005 and 2006), a Google Research Award (2017), and the second prize in the Horst Görtz Foundation’s IT Security Award (2014). In 2012, Mezini received an ERC Advanced Grant, the EU’s most prestigious research funding award. A member of the German Academy of Engineering Sciences and the Academia Europaea, Mezini was recently named an ACM Fellow, further cementing her status as a leader in the field of computer science.

        Speaker: Prof. Mira Mezini (TU Darmstadt)
      • 15:00
        Information from the SE division (Fachbereich SWT), awards for best SWT dissertation and SRC, outlook to SE26 30m
        Speakers: Kurt Schneider (Leibniz Universität Hannover), Timo Kehrer (Universität Bern)
    • 15:30 16:00
      SE Social Event: Transfer to ZKM by tram (tickets included)

      During our 2-hour visit of the ZKM, you can freely explore the exhibitions and/or join one of the 25-minute tours at 16:00, 16:30, 17:00, and 17:30.

      As a participant of SE, you can sign up for one of the tours here:
      https://terminplaner6.dfn.de/b/6929c2654c818b42711582002ec91c79-1056327

      More details about the ZKM and its current exhibitions are available at https://zkm.de

      At 18:00, the conference dinner will be held at ZKM, too.

    • 16:00 23:00
      SE Social Event: Visit and Dinner at ZKM | Zentrum für Kunst und Medien -- book tour at bit.ly/se-zkm ZKM | Zentrum für Kunst und Medien

      ZKM | Zentrum für Kunst und Medien

      Lorenzstraße 19, 76135 Karlsruhe

      During our 2-hour visit of the ZKM, you can freely explore the exhibitions and/or join one of the 25-minute tours at 16:00, 16:30, 17:00, and 17:30.

      As a participant of SE, you can sign up for one of the tours here:
      https://terminplaner6.dfn.de/b/6929c2654c818b42711582002ec91c79-1056327

      More details about the ZKM and its current exhibitions are available at https://zkm.de

      At 18:00, the conference dinner will be held at ZKM, too.

    • 09:00 10:30
      SE Industry Day Keynote: Roland Weiss and Benedikt Schmidt, ABB Audimax B

      Audimax B

      Building 30.95

      • 09:00
        Begrüßung am Industrie-Tag 15m
      • 09:15
        Software Engineering in Practice and Research at ABB 1h 10m

        Building and maintaining industrial systems comes with a unique set of challenges. They have to live over extended periods of time, meet stringent reliability and safety standards, have to deal with budgets in low-margin industries, and need to adapt to modern expectations on ease-of-use and fast innovation.

        In this keynote, we will elaborate on how ABB deals with these competing forces. First, we introduce ABB and the systems we are delivering to our customers, including the typical lifecycle phases they go through. Second, we reflect on the last 2 decades of industrial research that has been conducted to support the product development units. Third, we show the fundamental shifts these systems are going through right now and highlight challenges we face as practitioners. Finally, we conclude with an example of an application that leverages ML/AI to elevate the operator prowess and a look under the hood of the development of this application.

        About the speakers

        Dr. Benedikt Schmidt is a key driver of ABB's process automation activities towards autonomous operations, working as product owner for the Augmented Operator portfolio and Senior Principal Engineer. He joined ABB in 2015, focusing on research and development of ABB's data analytics practices and bringing them to our products. Before joining ABB, Benedikt worked at SAP's research center. He holds a PhD from Technical University of Darmstadt.

        Dr. Roland Weiss globally leads ABB's process automation R&D activities, covering a portfolio of control and IO hardware, embedded and safety systems, DCS automation software as well as IIoT middleware and applications. He joined ABB in 2005 and has held various R&D management positions, focusing on research on and development of ABB's automation systems in markets including power generation as well as robot and industrial automation. As part of a dedicated task force, he contributed to kick-starting ABB’s Digital unit. Before joining ABB, Roland Weiss headed a research team in the area of Formal Methods at University of Tübingen. He holds a PhD in Computer Science from University of Tübingen.

        Speakers: Dr Roland Weiss (ABB), Dr Benedikt Schmidt (ABB)
    • 10:30 11:00
      Coffee break 30m Audimax Foyer

      Audimax Foyer

      Building 30.95

    • 11:00 12:30
      SE Industrial Evidence Audimax B

      Audimax B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Jan Linxweiler (Technische Universität Braunschweig)
      • 11:00
        Processes, Methods, and Tools in Model-based Engineering -- A Qualitative Multiple-Case Study 22m
        Speakers: Jörg Holtmann (Digital Rail for Germany), Grischa Liebel (Reykjavik University), Jan-Philipp Steghöfer (XITASO GmbH IT & Software Solutions)
      • 11:45
        How Does Simulation-Based Testing for Self-Driving Cars Match Human Perception? 22m
        Speakers: Christian Birchler (Zurich University of Applied Sciences & University of Bern), Tanzil Kombarabettu Mohammed (University of Zurich), Pooja Rani (University of Zurich), Teodora Nechita (Zurich University of Applied Sciences), Timo Kehrer (University of Bern), Sebastiano Panichella (University of Bern)
      • 12:07
        A Survey on What Developers Think About Testing 22m
        Speakers: Philipp Straubinger (University of Passau), Gordon Fraser (University of Passau)
    • 11:00 12:30
      SE Industry Day Session 1 SR A+B

      SR A+B

      Building 30.95

      • 11:00
        Synthetic Data and Small Language Models: Privacy-Optimized AI for Electric Vehicles 45m

        As electric vehicles become more software-centric, AI-driven features increasingly shape the driving experience—from adaptive navigation to proactive diagnostics—yet they often rely on vast amounts of sensitive data. In this session, we will explore two complementary strategies to address these challenges: first, how synthetic data generated or augmented via Large Language Models and statistical methods empowers developers to train, fine-tune, and validate automotive systems without exposing real user information. Second, we will examine how Small Language Models (SLMs) can serve as function-calling agents in vehicles, offering a flexible and robust alternative to traditional rule-based systems. By applying compression techniques such as pruning, healing, and quantization to architectures like Microsoft’s Phi-3 mini, these compact models fit within hardware constraints yet retain the capacity to handle complex tasks efficiently. Together, these approaches pave the way for personalized yet privacy-compliant innovations that accelerate development in the evolving electric vehicle landscape.

        Speakers: Alexandra Wins (Mercedes-Benz), Benedikt Heidrich (Mercedes-Benz)
      • 11:45
        Moving Towards AI Operational Readiness: A Practical Enterprise Transformation 45m
        Speakers: Tobias Velke (DATEV), Wolfgang Frank (arconsis)
    • 11:00 12:30
      SE Variability Audimax A

      Audimax A

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      Convener: Andreas Metzger
      • 11:00
        Modeling Variability in Complex Software Systems 22m
        Speakers: Ferruccio Damiani (Dipartimento di Informatica, Università di Torino), Reiner Hähnle (TU Darmstadt), Eduard Kamburjan (University of Oslo), Michaël Lienhardt (ONERA), Luca Paolini (Dipartimento di Informatica, Università di Torino)
      • 11:22
        Variability Modeling of Products, Processes, and Resources in Cyber-Physical Production Systems Engineering 22m
        Speakers: Kristof Meixner (Vienna University of Technology), Kevin Feichtinger (Karlsruhe Institute of Technology), Hafiyyan Sayyid Fadhlillah (Johannes Kepler University Linz), Sandra Greiner (University of Southern Denmark), Hannes Marcher (TU Wien), Rick Rabiser (Johannes Kepler University Linz), Stefan Biffi (Vienna University of Technology)
      • 11:45
        Not Quite There Yet: Remaining Challenges in Systems and Software Product Line Engineering as Perceived by Industry Practitioners 22m
        Speakers: Martin Becker (Fraunhofer IESE Kaiserslautern, Germany), Rick Rabiser (Johannes Kepler University Linz), Goetz Botterweck (Trinity College Dublin, Lero)
      • 12:07
        Software Reconfiguration in Robotics 22m
        Speakers: Sven Peldszus (Ruhr University Bochum), Davide Brugali (University of Bergamo), Daniel Strüber (Chalmers | University of Gothenburg, Radboud University Nijmegen), Patrizio Pelliccione (Gran Sasso Science Institute (GSSI)), Thorsten Berger (Ruhr University Bochum and Chalmers | University of Gothenburg)
    • 11:00 12:30
      SE Co-Located: Treffen Arbeitskreis Microservices und DevOps Seminarroom 006 (Building 30.96)

      Seminarroom 006

      Building 30.96

      Straße am Forum 3, 76131 Karlsruhe
      Convener: Sandro Speth (Universität Stuttgart)
    • 12:30 14:00
      Lunch break 1h 30m Audimax Foyer

      Audimax Foyer

      Building 30.95

    • 14:00 15:30
      SE Industry Day Session 2 SR A+B

      SR A+B

      Building 30.95

      Straße am Forum 1, 76131 Karlsruhe
      • 14:00
        AI Survival Guide (Software Developer Edition): Wie man sich von KI nicht über den Tisch ziehen lässt 45m

        Künstliche Intelligenz (KI) hat in den letzten Jahren immense Fortschritte gemacht, die niemand vorhergesehen hat, und die Arbeitswelt auch in der Software-Entwicklung nachhaltig verändern. Zwar ist dieser Sprung eigentlich schon vorüber, er hat aber Investitionen unvorstellbaren Ausmaßes angestoßen, sodass es auch weiterhin große Fortschritte geben wird. Ein Ende ist erst dann abzusehen, wenn die Energiekosten höher sind, als die Kosten, um Menschen mit vergleichbaren Fähigkeiten zu bezahlen.

        Der Vortrag stellt zentrale Fragen in den Raum: Wenn KIs neue Talente bereits zu Beginn ihrer Karriere ersetzen können, wie kann dann der Nachwuchs in der Software-Branche gefördert werden? Ist es sinnvoll, Begriffe wie „Reasoning“ zur Beschreibung kognitiver Prozesse bei KIs zu nutzen? Oder führt das zu sehr zu einer Vermenschlichung und Überhöhung der Technologie, und darüber zu neuen Ängsten und Unsicherheiten? Müssen wir nicht unweigerlich zu einer gesamtgesellschaftlichen Reflexion über unsere kognitiven Fähigkeiten kommen?

        Historisch gesehen hat sich unser Intelligenzbegriff bereits mehrfach signifikant verändert – etwa als Computer die Fähigkeit entwickelten, schneller zu rechnen, besser Schach zu spielen, oder demnächst besser Autofahren zu können als Menschen. Es ist an der Zeit, unser Verständnis von Intelligenz erneut zu hinterfragen und an die neuen Realitäten anzupassen, die durch KI geschaffen werden. Die gegenwärtige Phase der KI-Entwicklung ist nicht das Ende menschlicher Kreativität und unserer Fähigkeit, über den Tellerrand zu gucken. Vielmehr ist sie eine Aufforderung zur Neuerfindung und Anpassung – und wie der Nachwuchs gestärkt und die Angst vor dem Unbekannten überwunden werden kann.

        Speaker: Rüdiger zu Dohna (codecentric)
      • 14:45
        Künstliche Intelligenz im Produkt Management - Wie KI bereits heute die Produktentwicklung verändert 45m

        Abstract:
        KI revolutioniert die Produktentwicklung – doch was bedeutet das konkret für Produktverantwortliche und Produktentwickelnde? In diesem Vortrag entdecken wir, wie Large Language Models die Spielregeln verändern: vom Innovationsprozess bis hin zu völlig neuen Produkten. Mit praxisnahen Einblicken zeigen wir, welche Chancen und Herausforderungen KI für die Zukunft unserer Arbeit bringt.

        Vita:
        Mustafa Yilmaz ist Standortleiter und verantwortlich für das Consulting am Standort Mannheim bei der andrena objects ag, einem Softwareentwicklungs- und Beratungshaus. Nach seinem Informatikstudium am Karlsruher Institut für Technologie (KIT) sammelte er mehr als 15 Jahre Erfahrung in der Beratung von Unternehmen zur Professionalisierung der Software-Produktentwicklung sowie in der Begleitung von Organisations- und Transformationsprozessen mit. Seit 2023 liegt sein Schwerpunkt auf der strategischen Einführung und dem gezielten Einsatz von Künstlicher Intelligenz, insbesondere im Bereich des Produktmanagements und der digitalen Transformation von Unternehmen.

        Speaker: Mustafa Yilmaz (andrena objects)