The needs from the industry and how to address them in a collaborative data ecosystem
Over the last years, the industry's needs have gone from simple delivery of materials and parts to a growing need for reliable and easily accessible meta-data. On the basis of concrete examples out of the Aerospace supply chain, the presentation will show some examples and will underline the importance of...
Advanced catalysts are key to sustainable energy, reducing emissions, and improving resource efficiency. However, the synthesis of novel catalysts usually involves a unique blend of scientific methods, precise catalyst formulations, and the empirical knowledge of scientists. Additionally, the wide variety of techniques performed at different beamlines in synchrotron radiation facilities, along...
Using RDF is a natural choice for modelling semantically linked metadata for FAIR research data. However, the learning curve for RDF is steep, and even for data stewards, becoming familiar with all the relevant technicalities can be a major barrier. Therefore, ULB Darmstadt is heavily involved in developing and providing services that facilitate the creation and use of semantic metadata,...
Enriching data with describing metadata is the key-enabler for the reusability and interoperability of experimental results and thus to further research in a scientific domain. However, in order to be able to use data of former scientific work (both initial data and result data from experiments), a common understanding of the semantics of this data is essential. This understanding is typically...
The focus of this project is the development of a standardized metadata vocabulary, essential for creating interoperable and easily discoverable data products across various research groups. By examining space weather-specific data products and formats, the project addresses the need for consistent metadata standards that will enhance collaboration and data sharing on an international scale....
For data to be fully exploitable and re-usable in different contexts it needs to be annotated with rich metadata that uses commonly understood vocabularies and semantics [1]. Using terminology that is standardized and agreed upon within a community ensures unambiguous understanding of metadata.
In the field of EM, a number of application-level initiatives independently started developing...
Metadata is a key element in data management when taking account of the F.A.I.R.(findable, accessible, interoperable and reusable) principles, answering the need for better data integration and enrichment. In the field of high-intensity laser-plasma physics, numerical simulations and experiments go hand in hand, complementing each other. While simulation codes are well documented and output...
As the volume of omics single-cell data continues to grow, so too must our data management and processing capabilities to ensure its effective secondary use, particularly in research and diagnostics. While single-cell data holds immense potential for AI applications, current documentation standards fall short of being AI-ready. To address these challenges, we organized a Writathon, resulting...
The study of climate change and its impact on marine environments requires large-scale, multidisciplinary data that are often collected by various national and marine institutes, fishery associations, as well as by research groups. With the proliferation of underwater observatories, profilers, and autonomous underwater vehicles (AUVs), significant progress has been made in collecting...
Currently, social scientists use different and sometimes proprietary software to analyse data, which processes metadata in diverse ways. Data formats of statistical software packages are only partially compatible and pose an obstacle to replication studies. Proprietary data formats jeopardise the requirement for interoperability enshrined in the FAIR principles. As part of KonsortSWD, we...
The electronic structure determines many of the macroscopic physical properties of a material. Photoelectron momentum microscopy (MM) has matured into a powerful tool for the detailed characterization of the exciting electronic properties of novel quantum materials. By applying the principles of high-resolution imaging modern instruments simultaneously capture hundreds of tomographic slices of...
The German Human Genome Phenome Archive (GHGA) is a national infrastructure that promotes the secure storage, exchange, and management of access-controlled human omics data. To facilitate user-friendly and comprehensive data submissions, we developed the GHGA metadata model. The standardized model aims at maximizing the amount of collected metadata on the submitter side, enabling reusable...
At the Helmholtz-Institute Freiberg for Resource Technology (HIF), researchers develop new technologies to improve circular economy. In this context, different types of samples (e.g. rock samples, recycling material) play an important role. The sample passes through different states and labs – starting at the sample preparation, through the analysis of the particular sample to the final...
Biomolecules, such as DNA and RNA, provide a wealth of information about the distribution and function of marine organisms, and molecular sequencing data from the marine realm is generated across several Helmholtz Centers. Biomolecular (meta)data, i.e. DNA and RNA sequences and all steps involved in their creation, exhibit great internal diversity and complexity. However, high-quality...
The Nuclear, Astro, and Particle Metadata Integration for eXperiments (NAPMIX) project was recently awarded funding within the scope of the OSCARS call on open science and will start in December 2024. The project aims to facilitate data management and data publication under the FAIR principles on the European level by developing a cross-domain metadata schema and...
Introduction: The environment plays an important role for human health and efficient linkage of epidemiological cohorts with environmental data is crucial to quantify human exposures. However, there are no harmonized standards for automatic mapping of metadata of our three domains Health (HMGU), Earth & Environment (UFZ), and Aeronautics, Space & Transport (DLR).
Objective: We aimed to...
There has been substantial increase in number of scientific publications across diverse disciplines. These publications often generate metadata, scholarly content and scientific models/source code etc. Though such information is made available to research communities under open science initiative, numerous scholarly repositories have emerged over the years to harvest metadata in various...
Here we present the latest updates of our data-driven approach to monitoring and assessing the state of open and FAIR data in the Helmholtz Association. The approach consists of two parts: a modular data harvesting-, validation- and assessment pipeline, and a dashboard with interactive statistics about the Helmholtz-data publications identified. The dashboard provides insight into which data...
Knowledge Graphs help to connect and organize information from different sources and entities. They can be used to apply advanced search and filtering techniques on very large datasets and reveal connections and dependencies across the data. To be useful, however, they require highly uniform and harmonized data sets. So far, most knowledge graphs on scientific data have used bibliographic data...
The vast amount of observations needed to train new generation AI models (Foundation Models) necessitates a strategy of combining data from multiple repositories in a semi-automatic way to minimize human involvement. However, many public data sources present challenges such as inhomogeneity, lack of machine-actionable data, and manual access barriers. These issues can be mitigated through the...
The aim of a cooperation between the DDI Alliance and [QualidataNet][1] - a network for qualitative data that is being created as part of the NFDI - is to describe qualitative data in a standardized way so that researchers can find it and use it for their own research, regardless of discipline and thematic location.
Since last year, QualidataNet has been involved in the metadata...
Scientists frequently need to get an overview of their experiments by summarizing information spread over multiple files and storage locations. This metadata may include items such as experimental conditions, subject details, and characteristics of the experimental data. It is common for researchers to spend time developing their own solutions tailored to their specific use case. However,...
The CREATIVE project aims to make the generic repository RADAR4KIT easily accessible and attractive for the domain-specific communities organized in the Climate and Environment Centre (CEC) at the Karlsruhe Institute of Technology (KIT). This aim will be achieved with the help of customized templates and input masks for subject-specific metadata, which enhance the RADAR4KIT usability for the...
Embedding semantics within research metadata serves to standardize, refine and contextualize it, thereby improving interoperability between data sources and promoting the FAIR principles. Within the Helmholtz Association, we are committed to evaluating existing semantic resources and established practices and to developing guidelines for their handling and use in the field of earth and...
In agrosystem science, the transition to a FAIR (Findable, Accessible, Interoperable, Reusable) data future is essential for fostering innovation and collaboration. While technical developments provide the necessary infrastructure, the true challenge lies in changing ingrained habits and cultural practices. To address this, the FAIRagro initiative has developed a participation concept aimed at...
The rapid evolution of research software necessitates efficient and accurate metadata management to ensure software discoverability, reproducibility, and overall project quality. However, manually curating metadata can be time-consuming and prone to errors. This poster presents two innovative tools designed to streamline and improve metadata management: fair-python-cookiecutter and...
Enriching data with metadata is a key concept for the data output of scientific
research to be FAIR. Data processing software and custom code often do not
support the annotation with metadata out-of-the-box or the usage process does
not mandate it. This confronts data creators and maintainers with challenges
to annotate their data. From a Human Machine Interface (HMI)...
Research data management (RDM) is an important aspect of modern scientific research, which is heavily relying on interconnected data sets and corresponding metadata. For modeling and integrating these interconnections and metadata, the Resource Description Framework (RDF) has often been proposed as a standard, since it has been in use by search engines and knowledge management systems for...
In our increasingly digital and interconnected world, the integration of Persistent Identifiers (PIDs) in metadata are essential for machine-readable and -understandable metadata as also described in the FAIR Guiding Principles for research data management. PIDs provide unique, permanent and machine-readable references to various types of digital objects, including publications, datasets,...
Software is important research output. Therefore, funding agencies are interested in the value that a software contributes to the overall results of a funded project. The Helmholtz Association is working towards a system to evaluate data and software publications. The "Task Group Helmholtz Quality Indicators for Data and Software Publications" has already published a vision paper about how...
The Sample Environment Communication Protocol (SECoP) provides a generalized way for controlling measurement equipment – with a special focus on sample environment (SE) equipment [1,2]. In addition, SECoP holds the possibility to transport SE metadata in a well-defined way.
SECoP is designed to be
- simple to use,
- inclusive concerning different control systems and control philosophies...
The complexity and diverse data requirements of energy system research demands a robust and adaptable metadata standard. The OEMetadata Standard, with its recent update to version 2.0, is designed to meet the needs of this transdisciplinary field. Illustrated through practical examples, the key features and enhancements of the standard are presented. Followed by the introduction of an...
RSpace is an open-source platform that supports researchers in the active research phase to plan, conduct, and document their work, and thereby make their research more robust and FAIR (Findable, Accessible, Interoperable, Reproducible). Interoperability with tools and services used by researchers throughout the research lifecycle is a fundamental element of RSpace's development philosophy....
Different roles interact with research data in very different ways: Technicians, experimental scientists, data analysts, modellers, supervisor, infrastructure providers, data stewards, toolchain providers, project managers, administrative personelle, liberians, publishers, NFDI contact persons, indexing service providers, external data user, programmers,...
Non of them can establish an...
When gathering your analog research data and metadata, including challenging-to-digitize experimental parameters, aiming at creating a knowledge graph, we suggest the following pipeline for achieving high data quality: agreeing on a shared vocabulary, expanding it to an ontology and eventually semantically annotating the recorded data.
To facilitate this pipeline we developed and use the...
The PATOF project builds on work at MAMI particle physics experiment A4. A4 produced a stream of valuable data for many years which already released scientific output of high quality and still provides a solid basis for future publications. The A4 data set consists of 100 TB and 300 million files of different types (hierarchical folder structure and file format with minimal metadata provided...
At GEOMAR, a multidisciplinary research centre, a large number of
heterogeneous biological and geological samples need to be managed:
Among other requirements, their metadata and data need to be stored in
a FAIR way, their provenance information as well as their physical
location in the sample storage need to be available, and scientists
need to be supported in organizing their sample...
The SEPIA project aims to improve the management and annotation of research data by providing a comprehensive sample database integrated with an open API. This initiative facilitates the capture and exchange of sample metadata, thereby enriching the research data collected at the Helmholtz-Zentrum Berlin (HZB). This presentation will explore the architecture and functionalities of the SEPIA...
A library is a super repository of digital and physical data archives, which is organized by metadata. This metadata however, maybe distributed across various databases due to, for example, topical or typical grouping. To provide a unified view or overview of all resources, the metadata needs to be aggregated, normalized, and potentially interconnected. DatAasee is such a metadata...
Interoperability is an ongoing challenge given the diverse nature of research and the tools and services researchers use. Addressing interoperability challenges and FAIRification of research at scale is therefore only possible with solid knowledge about the tools and services used in each stage of the research cycle and a forward-facing vision of how they might work together.
For the...
The DataPLANT consortium, a German National Research Data Infrastructure (NFDI), aims to provide plant researchers a robust and sustainable infrastructure for managing research data. Since the complexity of research data continues to grow, effective methods for managing, annotating, and sharing this data becomes increasingly important. DataPLANT integrates different established concepts for...
The collection and use of sensor data is crucial for scientists monitoring and observing the Earth's environment. In particular, it enables the evaluation of real natural phenomena over time and is essential for the validation of experiments and numerical simulations. Assessment of data quality beyond statistics includes knowledge and consideration of sensor state, including operation and...
Heritage science is an interdisciplinary field that involves the scientific study of cultural and natural heritage. It entails collecting and producing a wide variety of data, including descriptions of objects and sites, samples, sampling locations, scientific instrumentation, analytical methods, conservation and restoration records, environmental monitoring data, documentation, and digital...
Predicting the performance of aerospace and automotive structures requires detailed reflection of the actual manufacturing process of each produced part. This is especially the case for composite structures produced with additive manufacturing processes in view of their process complexity and its influence on the product reliability. For high-fidelity numerical models to reflect the actual...
The microstructure of materials is characterized by crystallographic defects, which ultimately determine the material properties. In computational materials science, methods and tools are used to predict and analyze defect structures. The increase of computational power has led to the generation of large amounts of complex and heterogeneous data, increasing the need for the implementation of...
FAIR Research Data Management in interdisciplinary large-scale projects is very challenging. Data formats, acquisition processes, and infrastructure are highly heterogeneous. Furthermore, many tasks in FAIR RDM are tedious and complex for the researchers. In this keynote, we will discuss the potentials of generative AI to support FAIR RDM on examples from a large-scale project.
We will discuss the world of interoperable semantics at both domain-specific and application-wide levels, focusing on how DMPonline has pioneered enhancements and integrations that promote seamless data exchange and usage across diverse research contexts.
Join us in understanding how DMPonline's developments in interoperable semantics improve data management and use across various domains. We...
After a brief introduction to the FAIR principles and the significance of automated assessments, participants will engage in hands-on sessions where they will compare the outputs of these tools on a curated list of datasets. The list represents datasets from various repositories that are typical within the biomedical context. Both a generalized overview of FAIR screening results at the...
In environmental sciences, time-series data is crucial for monitoring environmental processes, validating earth system models and remote sensing products, training of data driven methods and better understanding of climate processes. However, even today, there is no uniform standard and interface for making such data consistently available according to the FAIR principles. Therefore, within...
In this interaction session, we will consolidate our talk on creating FAIR, rich and shared experimental (meta)data with a knowledge graph in mind. We will present the individual tools of the software workflow live and interactively, starting from vocabulary terms via ontologies to entering research (meta)data and sending it to another Electronic Lab Notebook (ELN).
A prerequisite for FAIR...