In the field of polar and marine environmental research, a diverse array of items including instruments, platforms, models, custom-built facilities, and lab equipment are routinely employed. This results not only in substantial volumes of collected data, but also generates a wealth of accompanying metadata.
In this talk, we highlight the practical utility of registry.awi.de — an...
ABSTRACT
Introduction
Meaningful metadata are essential for the description, retrieval and reuse of data, in particular in multi-centric cooperative research projects. For the Integrative Human Circadian Daylight Platform (iHCDP, [https://ihcdp.org/][1]), a collaborative, transnational project between the University of Basel (Switzerland), the Technical University of Munich and the...
Light exposure significantly impacts various aspects of human psychology and physiology, including cognition, mood and circadian rhythm. The light in the real world has sophisticated characteristics; it is spatially articulated and temporally varying, even Changing the head direction and eye movement alter it. How the visual and non-visual light-mediated brain pathways encode these spatial and...
Metadata is the foremost element in data management strategy when taking account of the F.A.I.R.(findable, accessible, interoperable and reusable) principles, what is becoming increasingly important within the scientific community. Additionally, there is a strong need for better data integration and enrichment in the field of high-intensity laser-plasma physics in an international context: at...
Ion chromatography (IC) is an analytical method that separates ions in liquid samples according to their chemical and physical properties. This analytical method is widely used in different scientific fields such as environmental science (for example wastewater or soil analysis), food technology (food extract analysis), applied plasma science (characterization of plasma treated liquids) and...
Personalized light exposure data is progressively gaining importance in various sectors, including research, occupational affairs, and fitness tracking. Data are collected through a proliferating selection of wearable loggers and dosimeters, varying in size, shape, functionality, and output format. Despite or maybe because of numerous use cases, the field lacks a unified framework for...
In an ever-changing world, field surveys, inventories and monitoring data are essential for prediction of biodiversity responses to global drivers such as land use and climate change. This knowledge provides the basis for appropriate management. However, field biodiversity data collected across terrestrial, freshwater and marine realms are highly complex and heterogeneous. The successful...
Automating Metadata Handling in Research Software Engineering
Mustafa Soylu^ 1
Anton Pirogov^ 1
Volker Hofmann 1
Stefan Sandfeld 1
^ The authors contributed equally to this work
Institute for Advanced Simulation - Materials Data Science and Informatics (IAS9), Forschungszentrum Jülich, Jülich, Germany
Modern research is heavily dependent on software. The landscape of...
The Helmholtz Metadata Collaboration (HMC) and the Helmholtz Open Science Office have launched a joint initiative at the end of 2022 to strengthen and connect research data repositories in the Helmholtz Association, and to increase their visibility in the international research landscape. Research data repositories form central hubs for metadata on the Road to FAIR: They generate, consolidate...
Editing [Linked Data][1] documents represents an enormous challenge to users with limited technical expertise. These users struggle with language rules, relationships between entities, and interconnected concepts. These issues can result in frustration and low data quality. In order to respond to this challenge, we introduce a new editor, designed to facilitate effortless editing of [JSON-LD...
FAIR research data and the adoption of semantic technologies hold a great promise to improve the quality, openness, and efficiency of research in the physical sciences. However, the FAIR building we wish to constructs rests on foundations that are still shaky: Metadata often lack the quantity and quality to harness the full potential of advanced search functionalities, knowledge graphs, and AI...
Open science promotes innovation, improves the transfer of knowledge to society and the economy, and ensures quality and transparency in research. The Helmholtz Association, Germany's largest research performing organization, has thus adopted an Open Science Policy in September 2022 [1].
This policy supports openness as a central endeavor of science and makes open science the standard for...
The Helmholtz Metadata Collaboration (HMC) has developed the HMC
dashboard on Open and FAIR Data in Helmholtz. The dashboard allows users
to monitor and interactively analyze statistics on open and FAIR data
produced by researchers in the Helmholtz Association. It can be used to
analyze in which repositories Helmholtz researchers make their data
publicly available, to monitor...
In 2021 HMC conducted its first community survey to align its services with the needs of Helmholtz researchers. A question catalogue, with 49 (sub-)questions based on an expertise-adaptive approach, was designed and disseminated among researchers in all six Helmholtz research fields. 631 completed survey replies were obtained for analysis.
The HMC Community Survey 2021 provides insight into...
This research poster dives into the important impact of four simple but crucial elements in research data policies: clear titles, persistent identifiers, publication dates, and open availability. These elements, often underestimated in policy, play a pivotal role in enhancing data discoverability, transparency, and collaboration - ultimately strengthening the foundation of modern scientific...
Standardized metadata and its proper storage are essential for effective management of scientific research data. The challenge lies in manually compiling such metadata, a process which can be both tedious and prone to human error. To address this problem, we introduce the Mapping Service, developed within the framework of HMC.
The Mapping Service helps to streamline the process of metadata...
To be sustainable and useful, scientific data should be FAIR. These goals can only be achieved by definition and adoption of metadata standards and implementation of tools and services that support these standards. Unfortunately, the diversity of needs with respect to scientific (meta)data leads to a large gap between the scope and pace of large-scale standardization efforts and the day-to-day...
The important increase in efficiency of perovskite-based solar cells (PSCs) in the last decade is a result of scientific work, which produced a huge quantity of literature and data-sets (between 2014 and 2022 almost 30,000 reports were published). The aim of this work is to elaborate an ontology which can primarly be used to classify literature paragraphs according to the subject discussed...
The PATOF project builds on work at MAMI particle physics experiment A4. A4 produced a stream of valuable data for many years which already released scientific output of high quality and still provides a solid basis for future publications. The A4 data set consists of 100 TB and 300 million files of different types (hierarchical folder structure and file format with minimal metadata provided...
In this presentation we will introduce to the current HMC activities and outcome in HUB Earth and Environment: Our process for developing a guideline is planned as a coordinated procedure. For every single implementation guide, we go through the same questions, up to tests - based on use cases and definition of abstract test classes, in order to be able to validate the implementation. Our...
Research across the Helmholtz Association is based on inter- and multidisciplinary collaborations across its 18 Centres and beyond. However, the (meta)data generated through Helmholtz research and operations is typically siloed within institutional infrastructures and often within individual teams. The result is that the wealth of the association’s (meta)data is stored in a scattered manner,...
Persistent identifiers (PIDs) are an integral element of the FAIR principles (Wilkinson et al. 2016) as they are recommended to refer to data sets and metadata. They are, however, also considered to be used to refer to other data entities, like people, organizations, projects, laboratories, repositories, publications, vocabularies, samples, instruments, licenses, methods and others....
[Research Object Crate][1] (RO-Crate) is an open, community driven data package specification to describe all kinds of file-based data, as well as entities outside the package. In order to do so, it uses the widespread JSON-format, representing Linked Data ([JSON-LD][2]), allowing to link to external information. This makes the format flexible and machine-readable. These packages are being...
NeXus is a well established standard for data exchange of neutron, x-ray and muon large scale facilities. Being around for over 20 years with dedicated governance structures it serves as a successful example of a long-lived standard. NeXus as an ecosystem can be difficult to navigate as people refer to its parts using varying terminology and sometimes having different concepts in mind even...
Computer simulations are an essential pillar of knowledge generation in science. Understanding, reproducing, and exploring the results of simulations relies on tracking and organizing metadata describing numerical experiments. However, the models used to understand real-world systems, and the computational machinery required to simulate them, are typically complex, and produce large amounts of...
Controlled vocabularies are used to describe knowledge within a particular domain, encompassing a comprehensive collection of domain specific terms. Using controlled vocabularies not only mitigates the challenge of data ambiguity, but also offers several advantages, including references to term definitions, particularly within metadata schemas. Additionally, they foster semantic...
Electronic lab notebooks (ELNs) are essential for gathering analog metadata, including challenging-to-digitize experimental parameters. However, interdisciplinary research institutions often employ various systems, creating barriers to metadata exchange. Addressing this interoperability gap, we're developing an API-based data exchange to enhance interoperability between the ELNs Herbie and...
FAIR WISH - FAIR Workflows to establish IGSN for Samples in the Helmholtz Association is an HMC funded project of the first cohort 2020. IGSN, the International Generic Sample Number, is a globally unique, citable and persistent identifier (PID) for physical samples with discovery functionality in the internet. IGSNs enable direct links between data, publications and the originating samples...
Summary
Our presentation reflects the topic ‘Facilitating connectivity of research data’, in particular the subtopics: ‘Metadata annotation and management during and close to the research process’, and ‘Data interoperability through harmonised metadata and interoperable semantics’.
The presentation describes significant advances in incorporating metadata into the RSpace digital...
Time-series data are crucial sources of reference information in all environmental sciences. Publishing such data consistently and timely for monitoring and warning purposes becomes more and more important. In this context, the Helmholtz-Centers from the research field Earth and Environment (E&E) operate some of the largest measurement-infrastructures worldwide (e.g., TERENO, DANUBIUS or...
The environment plays an increasingly important role for human health and efficient linkage with environmental and earth observation data is crucial to quantify human exposures. Currently, there are no harmonized metadata standards for automatic mapping. This project aims to facilitate the linkage of data of different research fields by generating and enriching interoperable and...
The Sample Environment Communication Protocol (SECoP) provides a generalized way for controlling measurement equipment – with a special focus on sample environment (SE) equipment [1]. In addition, SECoP holds the possibility to transport SE metadata in a well-defined way.
SECoP is designed to be
- simple to use,
- inclusive concerning different control systems and control philosophies,...
Data acquisition (DAQ) systems continue to advance in power, but manual data input will remain required as experiments necessarily check for the unforeseen. Researchers often use electronic lab books or fallback solutions like Excel or Google Docs to record actions and events, highlighting the need for an intuitive interface that enables live- and post-processing while remaining linkable to...
Biomolecules, such as DNA and RNA, provide a wealth of information about the distribution and function of marine organisms, and biomolecular research in the marine realm is pursued across several Helmholtz Centers. Biomolecular metadata, i.e. DNA and RNA sequences and all steps involved in their creation, exhibit great internal diversity and complexity. However, high-quality (meta)data...
Currently the amount and diversity of high-quality atmospheric remote sensing observations from satellites is quickly increasing, and their synergetic use offers unprecedented knowledge gaining opportunities. FAIR data are important for this kind of data interoperability and reusability. This project will lead to FAIR satellite data products. It will develop metadata standards for describing...
Simulation of aerospace or automotive structures can be ultimately improved by reflecting the actual manufacturing status of the produced parts in detail. This is especially the case for composite structures in view of the complexity of the involved manufacturing processes and their influence on the product reliability. High-fidelity numerical models have to be developed to reflect the actual...
Using terminologies can empower scientists and infrastructure providers to realise a machine-processable expression of the information contained in their research data and other academic outputs. In the academic world, the ambiguity of terms and the lack of appropriate keywords is tedious and annoying to both, scientists and machines. In addition, there is a lack of controlled vocabularies in...
Research across the Helmholtz Association is based on inter- and multidisciplinary collaborations across its 18 Centres and beyond. However, the (meta)data generated through Helmholtz research and operations is typically siloed within institutional infrastructures and often within individual teams. The result is that the wealth of the association’s (meta)data is stored in a scattered manner,...
Establishing semantic data and knowledge graphs in scientific working groups is no easy feat. In most cases there is neither a user friendly tool chain nor experience with ontologies for the respective research field. But without a start, said experience can never be gained. The same is true for individuals that want to start into the field.
We thus see knowledge graph development not as a...
The field of clinical research data is rich in information that is often underutilized due to the complexity resulting from heterogeneous representations of the data and the lack of suitable tooling for its harmonization. Ineffective data preprocessing hinders potential insights and prevents effective reuse and combination of data that could otherwise drive progress in the scientific field by...
We present a workflow to improve the management of Magnetic Resonance Imaging data and to increase its compliance with the FAIR principles. This involves using the JSON Metadata Mapping Tool we have developed to map metadata from a domain-specific file format to a JSON schema based format, and storing the data and the mapped metadata in repositories. Some steps in the workflow are automated,...
The Helmholtz Metadata Collaboration (HMC) has developed the HMC
dashboard on Open and FAIR Data in Helmholtz. The dashboard allows users
to monitor and interactively analyze statistics on open and FAIR data
produced by researchers in the Helmholtz Association. It can be used to
analyze in which repositories Helmholtz researchers make their data
publicly available, to monitor...