- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
In our HELIPORT workshop, we will provide insights into our project and share our results. In addition, we would like to provide a platform for the presentation of similar projects, as well as extensions or integrations from the surrounding research areas. The overall goal of the workshop is bringing together different institutions with similar challenges and establishing a community around our HELIPORT project.
We welcome submissions on related projects, metadata in our scientific field in general or workflows, in the form of talks (10-20 min) or posters (A0). We also welcome first or future HELIPORT use-cases from within our community!
We therefore encourage you to submit an abstract for a talk or poster. The submissions are dedicated to four thematic points:
Please submit your abstract here.
HELIPORT (Helmholtz ScIentific Project WORkflow PlaTform) is a project funded by the Helmholtz Metadata Collaboration, and runs from July 2021 until June 2023. HELIPORT aims to make the entire life cycle of a scientific project findable, accessible, interoperable and reusable according to the FAIR principles. In particular, our data management solution deals with the areas from the generation of the data to the publication of primary research data, the workflows carried out and the actual research results. For this purpose, a concept was developed which shows the various essential components and their connections.
The DMA ST1 synergy preparatory workshop takes place at HZDR campus from 14-15 June from 12:00 in building 114, room 203.
The HELIPORT project aims to make the components or steps of the entire life cycle of a research project at the Helmholtz-Zentrum Dresden-Rossendorf (HZDR) and the Helmholtz-Institute Jena (HIJ) discoverable, accessible, interoperable and reusable according to the FAIR principles. In particular, this data management solution deals with the entire lifecycle of research experiments, starting with the generation of the first digital objects, the workflows carried out and the actual publication of research results. For this purpose, a concept was developed that identifies the different systems involved and their connections. By integrating computational workflows, HELIPORT can automate calculations that work with metadata from different internal systems (application management, Labbook, GitLab, and further). This presentation will cover the first year of the project, the current status and the path taken so far in the life cycle of the project.
The newly installed HELIPORT system in POLARIS laboratory is interfaced to POLARIS database, or known as SciCat. By using the SciCat's generic python library called Pyscicat, both writing and reading via REST APIs are implemented. In the writing part, LabView programs will first collect data and metadata e.g. from experimental diagnostics and then call Pyscicat to transfer this data to SciCat’s database. In the reading part, users can manually store the URLs of data within the Heliport project to have easy excess later and the users can also view their data in a “table-like” displays by our developed plug-in app within the HELIPORT system. With the help of HELIPORT and SciCat, the data are now one step further towards FAIR.
Data processing or analysis workflows are generally understood as processes running without any user intervention where usually only a small set of parameters being provided upon workflow submission, adjustment of which is also limited by low turnaround rates of workflow runs due to scheduling alone. Many types of experimental data analyses require manual experimentation with parameters to succeed, necessitating interactivity and fast iteration. In this talk we present examples of interactive workflow applications at HZDR, from data analysis and simulation; discuss challenges arising from differences to completely automated workflows and lay-out the related data-provenance and project-resource management features we envision for the HELIPORT workflow platform.
Modern Earth sciences produce a continuously increasing amount of data and metadata from observations, samples and analyses. In the HMC project ALAMEDA, we are developing a platform to manage, visualize and share metadata collected in laboratory and field experiments. We use HELIPORT in conjunction with Business Process Model and Notation (BPMN) to combine data from different software applications already used by the community to manage laboratory, sensor, sample and workflow data.
UNICORE (UNiform Interface to COmputing REsources) provides tools and services for building federated systems, making high-performance computing and data resources accessible in a seamless and secure way for a wide variety of applications in intranets and the internet. UNICORE offers comprehensive RESTful APIs for batch job management, data access, data movement and computational workflows.
This talk gives an overview about the current feature set and overall state of the UNICORE ecosystem, presents recent developments and discusses different workflow options (UNICORE native and CWL).
The aim of DAPHNE4NFDI is to create a comprehensive infrastructure to process research data from large scale photon and neutron infrastructures according to the FAIR principles. Broadly, we will provide the following infrastructure for the community:
1. Improve metadata capture through consistent workflows supported by user-driven online logbooks that are linked to the data collection;
2. Establish a community repository of processed data, new reference databases and analysis code for published results, linked, where possible, to raw data sources;
3. Develop, curate and deploy user-developed analysis software on facility computing infrastructure so that ordinary users can benefit from and repeat the analysis performed by leading power user groups through common data analysis portals.
Managing high-throughput data incoming from different sources is a major challenge in achieving FAIR data. Experimental data currently traverses a complicated web of machines and python Jupyter notebooks that are used for on- and off-line analyses. This level of complexity in the data pipeline leads to difficulties making the data FAIR. We present a prototype for data integration of electron experimental data, using Apache kafka messaging protocol, MongoDB database, and Grafana + Plotly visualization to ease the data pipeline into being visualized in real-time in a flexible fashion, thereby facilitating more complete knowledge handling.
A major challenge in enabling FAIRer data and metadata is developing and deploying user interfaces that encapsulate clear and consistent metadata schemata. The FWKT Team is building tools for capturing metadata from simulation and experimental datasets through the combined use of stylized user-input forms and scraping of information from existing data structures. Captured information is then uploaded to queryable databases, with the current trials being focused on SciCat and mongoDB. Current sub-projects include metadata extraction toolkits for a variety of simulation codes, a SciCat extension to the laserdata-importer tool for uploading existing experimental data, and a Python-Flask-WTForms-mongoDB ShotSheet App for capturing and storing laser shot diagnostics.
The documentation at our centre is very diverse, which makes it difficult to choose a lab documentation tool, that fits the needs for most of the scientists. One key aspect to increase the usage and acceptance of a shared tool that highly depends on the infrastructure landscape of the research center is the degree of automation to reduce errors and save time. This poster provides some examples how to connect different meta data sources to import and export specific information for a better interaction. This provides a valuable basis of structured information for advanced searches, data processing and file exports, metadata catalogs or other publications. A functional rich API is essential to collect and provide the specific metadata to increase the FAIRness of the experiments.
HELIPORT is a data management solution that aims at making the components and steps of the entire research experiment’s life cycle discoverable, accessible, interoperable and reusable according to the FAIR principles.
Among other information, HELIPORT integrates documentation, scientific workflows, and the final publication of the research results - all via already established solutions for proposal management, electronic lab notebooks, software development and devops tools, and other additional data sources. The integration is accomplished by presenting the researchers with a high-level overview to keep all aspects of the experiment in mind, and automatically exchanging relevant metadata between the experiment’s life cycle steps.
Computational agents can interact with HELIPORT via a REST API that allows access to all components, and landing pages that allow for export of digital objects in various standardized formats and schemas. An overall digital object graph combining the metadata harvested from all sources provides scientists with a visual representation of interactions and relations between their digital objects, as well as their existence in the first place. Through the integrated computational workflow systems, HELIPORT can automate calculations using the collected metadata.
By visualizing all aspects of large-scale research experiments, HELIPORT enables deeper insights into a comprehensible data provenance with the chance of raising awareness for data management.
Awareness of the need for FAIR data management has increased in recent years but examples of how to achieve this are often missing. Focusing on the large-scale instrument A4 at the MAMI accelerator, we transfer findings of other projects to improve raw data, i.e. the primary output stored on long-term basis, according to the FAIR principles. Here, the instrument control software plays a key role as the central authority to start measurements and orchestrate connected (meta)data-taking processes. In regular discussions we incorporate the experiences of a wider community and engage to optimize instrument output through various measures from conversion to machine-readable formats over metadata enrichment to additional files creating scientific context.
PUNCH4NFDI is the NFDI consortium for particle, astroparticle, astro-, nuclear and hadron physics.
The work of the consortium is organised in seven task areas: management & governance, data management, data transformations, data portal, data irreversibility, synergies & services, education, training, and outreach & citizen science.
Here, we give an update on current technical implementations and work-in-progress.
Building upon the analysis of our established scientific workflow, we present ongoing improvements and additions for data and metadata handling, in particular a concept of a system of inter-connected databases, which will help to record data and metadata and thereby provide input for other databases. One example is the parsing tool of raw data files, re-gaining metadata being encoded in data file paths, facilitating data retrieval for analysis. We also present a concept of additional software modules which connect databases, data/metadata sources and data processing. They provide control interfaces for the scientists. As example, we show the prototype of the Draco Laser Shot Counter tool, which forwards data from one subsystem to others and can further software-trigger other processes.
We present the analysis of our established scientific workflow during experiments in the context of laser-driven particle acceleration. We focus on the occurrences and types of data generation and associated metadata. The workflow schematics are formatted abstract to allow for comparison to or application in other domains. We recognize that a substantial part of information cannot be entered automatically but relies on human input. Nevertheless, a lot of information, in particular metadata, is transferred manually by scientists, and cross-checks are not recorded. This leads to a high workload during experiments but allows for enormous flexibility.
The terahertz (THz) facility TELBE at HZDR provides intense THz sources uniquely suited for the study of nonlinear light-matter interactions. The TELBE THz sources are driven by the ELBE electron accelerator and provide frequency-tunable THz pulses with field strengths of several 100 kV/cm, pulse durations of a few picoseconds and a repetition rate on the order of 100 kHz. These light pulses can be used to excite multiple low-energy degrees of freedom in matter, such as spins, lattice dynamics or collective quasiparticle oscillations. The resulting dynamics are typically probed with femtosecond time resolution using optical lasers. Achieving this time resolution requires strategies to overcome the intrinsic time jitter between the accelerator-based THz pulses and the optical pulses generated by table-top laser systems. We use a measure-and-sort approach that achieves the required time resolution, but requires the measurement of precise time stamps for each individual light pulse in the experiment. The corresponding high data rates of several GB/min require a fast network and computing infrastructure as well as sustainable concepts for data management and metadata generation. This is all the more important as TELBE is a user facility requiring rapid visualisation and sharing of data and associated metadata. A central hub for planning, monitoring, documenting and archiving the complex workflows at TELBE is therefore highly desirable. The implementation of HELIPORT at TELBE will therefore be a major improvement in terms of automation and meeting the requirements of the FAIR principle.
Within HELIPORT, providing guidance to scientific projects and workflows according to the FAIR principles within the entire research experiment lifecycle, the domain- and lab-specific workflows need to be embedded. To do so, we have analyzed the established scientific workflow during experiments in the context of laser-driven particle acceleration, with emphasis on data and metadata sources and their respective occurrences. Among the instances where either machines or humans generate or process data, we have identified where human input or interaction is mandatory and will prevail, as well as chances for automation – still under supervision and control by scientists. We present our strategy for a system of inter-connected databases and management software modules, interfacing to HELIPORT.
Data Management Plans (DMPs) are crucial for a structured research data management and often a mandatory part of research proposals. By using tools, DMPs can be effectively developed and managed. Our aim is to develop a quick and easy to use DMP service for members of Saxon research institutions. In order to evaluate 18 of the existing tools, we defined 31 requirement parameters covering aspects with regard to basic functions, technical aspects and user-friendliness. The highest total evaluation scores were reached by Data Stewardship Wizard, DMPTool and RDMO NFDI4Ing. In a next step, we will check the feasibility of adapting each of the three tools according to our needs and estimate the respective workload. The most suitable DMP tool will then be customized for our requirements.
HELIPORT tracks the life cycle of scientific data by linking PIDs, DOIs, SW repositories and many other things. If the sources are adequately described by metadata, a good provinance of the results and their findability is achievable. However, there is a problem with the "I" and "R" in the F.A.I.R. principles when a domain does not have an metadata standard. As part of our role as a test use case for HELIPORT, we have started to define an metadata standard for High Intensity Lasers (HIL) and associated experiments. This effort will be continued in the HMC project HELPMI. Based on this, the OpenPMD and NeXuS file formats will be extended and merged as far as possible.
One of the main enablers for Interoperability and Reproducibility of scientific research data could be the documentation and harmonization of semantics and data structures. We are developing a concept and prototype for an infrastructure and an end-user graph data editor that support these tasks.
Users of the editor will be able to enter their data and metadata in a graph while getting suggestions on existing semantics and structures. If necessary, they can also document their own semantics. The infrastructure provides a platform for publishing and subsequently harmonizing user-made semantics on both a local scale (e.g. within a project) and at a global scale (e.g. within the community of a scientific domain).
We are looking forward to define interfaces to HeliPort.
Simulations play a crucial role in instrument design as a digital precursor of a real-world object. To preserve the symbiosis of simulated and real-world instrument beyond commissioning we connect the two worlds at the NeXus file level. The instrument section of the produced NeXus file is enriched with detailed simulation parameters where the current state of the instrument is reflected. As a result, the enriched instrument description increases the reusability of experimental data in sense of the FAIR principles. The data is ready to be exploited by machine-learning techniques, such as for predictive maintenance applications, as it is possible to perform simulations of a measurement directly from the NeXus file.
The GSI accelerator facility faces challenges for the institutional management of research data due to the diverse nature of the generated datasets, and these challenges will intensify when the FAIR-facility comes online. The conceptualisation of RDM at GSI is advancing to achieve best practices, and to manage the specific challenges. In addition to the internal RDM goals, GSI is involved with several external open science projects. This presentation shows the development path of RDM at GSI, focusing on some tools that will be on offer. For example: the RDMO DMP tool; an institutional data repository to facilitate dataset-record generation, smaller dataset uploads and connectivity to larger datasets on a data lake; and a new logbook option to complement the existing ELog.
Currently, nanofabrication processes at the HZDR cleanroom are recorded on a manually compiled paper printout and later transferred to a wiki. This hinders search, automated information processing, and reuse of process parts. As an improvement, a paperless approach is being developed.
We integrate sample ancestry and processing history stored in MediaWiki into HELIPORT by developing custom modules. This will allow HELIPORT to collect metadata while guiding through the manual workflows that represent cleanroom processes.
At the current point of progress, we estimate that it may well be feasible and useful to generalize the approach and thereby contribute to possible future efforts of representing samples or manual workflows in HELIPORT.
The Dresden High Magnetic Field Laboratory (Hochfeld-Magnetlabor Dresden, HLD) focuses on modern materials research in high magnetic fields. High magnetic field experiments are the ideal way to gain insights into the matter that surrounds us. Magnetic fields allow for the systematic manipulation and control of material properties – which is why these kinds of experiments are conducted on new materials so that their fundamental properties can be explored and so that they can be optimised for future application.
At the High-Field High-Repetition-Rate Terahertz facility @ ELBE (TELBE),
ultrafast terahertz-induced dynamics can be probed in various states of matter with highest precision. The TELBE sources offer both, stable and tunable narrowband THz radiation with pulse energies of several microjoules at high repetition rates and a synchronized coherent diffraction radiator,that provides broadband single-cycle pulses. The measurements at TELBE are data intensive, which can be as high as 20GB per experiment, that can lasts up to several minutes. As a result, the current data aquisition and data analysis stages are decoupled, where in the first step the primary data is processed and stored at HZDR and in a later step, restricted data access is made available to the user for post-processing.
In this presentation, we present an integrated workflow for post-processing of the experimental data at TELBE with in-built exchange of metadata between the experiment control software LabView and the workflow execution engine UNICORE. LabView manages the data storing exchanges metadata with the electronic lab notebook for automated documentation. We also present the guidance system HELIPORT which manages the metadata of the associated project proposal and job information from UNICORE, and integrates with the electronic lab notebook (MediaWiki), providing a user-friendly interface for monitoring the actively running experiments at TELBE.
AiiDA is an open-source Python infrastructure for devising complex workflows associated with modern computational science and streamlining the four core pillars of the ADES model: Automation, Data, Environment, and Sharing. In this contribution, we showcase features of AiiDA like workflow-forging, high-throughput capability and data provenance as implemented in the AiiDA-FLEUR plugin. Finally, we address the possibility of managing AiiDA-projects through HELIPORT.
In the initial HMC application, CWL workflows had a special role in heliport. This contribution explains why this seemed necessary at the time and illustrates what has changed since then.
The role of UNICORE as a central component that provides access to our computing resources and also supports workflows has emerged over the course of the project.
It is challenging to support everything that modern workflow languages can support. Instead, we focus in the future on the work with provenance information (visualisation, ...). The execution of CWL will slowly be phased out and moved to stand-alone third-party tools (e.g. Rabix, Toil, UNICORE, ...) which are able to generate provenance metadata according to established standards (wf4ever wfdesc, W3C PROV, ...). This would also allow us to import workflows from other custom software stacks (e.g. spekNG, Eupraxia Notebooks, ...) and move the implementation into the tools themselves instead of creating more HELIPORT plugins.
One of our first experiments, which we more or less randomly supported in the summer of 2021 at the HZDR, was a detector test as part of our Mu2e collaboration with FERMILAB. We evaluated the performance and data acquisition system of two detectors that will be used to monitor the stop target for the upcoming Mu2e experiment at Fermilab: the high-purity germanium (HPGe) and lanthanum bromide (LaBr) detectors in the presence of the pulsed gamma ray at the gELBE beamline of the ELBE facility at HZDR. The documentation of the experiment in the different systems was a first field test for the deployment of HELIPORT and everything started with the proposal 21102205-ST.
This contribution is a short demonstration of the documentation of the gELBE experiment as one of our first HELIPORT use cases.
In this talk, different approaches to keeping the HELIPORT code base maintainable will be presented. It will discuss both the tools used to automate various aspects of development and operation of HELIPORT, as well as how certain aspects of development are approached and how the choice of libraries and tooling helps these aspects.
The workshop contribution provides a comprehensive overview of how Docker can be used as a tool for containerizing the HELIPORT software. The aim is to offer developers a fast and easy way to launch a local instance of HELIPORT in development mode and start coding directly. Additionally, a container configuration is provided for productive deployment.
For this purpose, the individual components of HELIPORT are isolated in separate Docker containers. The required software dependencies for the software operation are already installed within the underlying container images, relieving developers from this task. The use of Docker also ensures software portability across various operating systems. Furthermore, the Docker configuration allows for individual customization for productive operation of HELIPORT, accommodating specific requirements such as certificates or secure access to specific areas of the application. The process isolation and targeted resource management provide advantages, particularly in productive environments, for secure software operation.
An interactive session to get people started with HELIPORT development.
We will show participants the way around the HELIPORT repository, its structure rooted in how Django applications work, show where to find what you're looking for and where to find the documentation. Then we will set up a local development environment allowing you to run the tests and build the documentation and finally start the development of your own HELIPORT apps.
Please bring you laptop with a working Python (>= 3.8), Poetry¹ (>= 1.2) and Yarn² installation if you would like to participate.
¹) https://python-poetry.org/
²) https://yarnpkg.com/