The Helmholtz Metadata Collaboration aims to make the research data [and software] produced by Helmholtz Centres FAIR for their own and the wider science community by means of metadata enrichment [1]. Why metadata enrichment and why FAIR? Because the whole scientific enterprise depends on a cycle of finding, exchanging, understanding, validating, reproducing), integrating and reusing research...
Publishing data in a FAIR [1] way is already part of good scientific practice. While institutional policy as well as funding and publishing guidelines support this, scientist, technicians, and data stewards struggle to realize it when handling their research data. The reason is that the FAIR principles are high level principles and guidelines rather than concrete implementations. This is one...
Software as an important method and output of research should follow the RDA "FAIR for Research Software Principles". In practice, this means that research software, whether open, inner or closed source, should be published with rich metadata to enable FAIR4RS.
For research software practitioners, this currently often means following an arduous and mostly manual process of software...
The HELIPORT project aims to make the components or steps of the entire life cycle of a research project at the Helmholtz-Zentrum Dresden-Rossendorf (HZDR) and the Helmholtz-Institute Jena (HIJ) discoverable, accessible, interoperable and reusable according to the FAIR principles. In particular, this data management solution deals with the entire lifecycle of research experiments, starting...
We present three use cases which showcase methods of providing a detailed metadata description with the goal of increasing the reusability of data.
irst, Hub Energy presents a photovoltaic system which required ontology development and the implementation of data models based on standards like IEC 61850 [1] or SensorML [2] as well as on FAIR Digital Objects (FDO) [3]. The backend was realized...
Modern science is to a vast extent based on simulation research. With the advances in high-performance computing (HPC) technology, the underlying mathematical models and numerical workflows are steadily growing in complexity.
This complexity gain offers a huge potential for science and society, but simultaneously constitutes a threat for the reproducibility of scientific results. A main...
In an ever-changing world, field surveys, inventories and monitoring data are essential for prediction of biodiversity responses to global drivers such as land use and climate change. This knowledge provides the basis for appropriate management. However, field biodiversity data collected across terrestrial, freshwater and marine realms are highly complex and heterogeneous. The successful...
Digital metadata solutions for epidemiological cohorts are lacking since most schemas and standards in the Health domain are clinically oriented and cannot be directly transferred. In addition, the environment plays an increasingly important role for human health and efficient linkage with the multitude of environmental and earth observation data is crucial to quantify human exposures. There...
Details of less than 10% of the 80 million individual items in the collection at the Natural History Museum can be obtained via our Data Portal but much of it remains undigitized with other data associated with the collection recorded but not delivered in a coherent system. In 2018, 77 staff at the Natural History Museum, London, took part in a successful collections assessment exercise. 17...
With new specialisations such as Data Science driven by digitisation, efficiency potentials of a digital transformation are raised in both empirical research and data governance processes. Here, one challenge is to establish open and interoperable datasets, recognising the FAIR criteria (cf. Wilkinson et al., 2016) as a standard of that process. Data – as well as metadata – should comply to...
Making research reproducible and FAIR (Findable, Accessible, Interoperable, and Reusable) often requires more information than what is commonly published within scientific articles. There is a growing number of repositories for publishing additional material like data or code. However, articles are still at the center of most scientific work and thus efforts on gathering information which is...
Annotation is one of the oldest cultural techniques of mankind. While in past centuries pen and paper were the means of choice to add annotations to a source, this activity has increasingly shifted to the digital world in recent years. With the W3C recommendation 'Web Annotation Data Model', a powerful tool has been available since 2017 to model annotations in a wide variety of disciplines and...
The Collaborative Research Centre AquaDiva is a large collaborative project spanning a variety of domains, such as biology, geology, chemistry and computer science with the common goal to better understand the Earth’s critical zone, in particular, how environmental conditions and surface properties shape the structure, properties, and functions of the subsurface. Within AquaDiva large volumes...
Software as an important method and output of research should follow the RDA "FAIR for Research Software Principles". In practice, this means that research software, whether open, inner or closed source, should be published with rich metadata to enable FAIR4RS. For research software practitioners, this currently often means following an arduous and mostly manual process of software...
The year 2022 marks the 10th anniversary of the Registry of Research Data Repositories - re3data. The global index currently lists over 2,800 digital repositories across all scientific disciplines – critical infrastructures to enable the global exchange of research data. The openly accessible service is used by researchers and services worldwide. It provides extensive descriptions of...
Recording data with the help of photons and neutrons is limited to bigger institutes. Besides the limited time slots, this process is also quite expensive. To save resources, DAPHNE4NFDI focuses on creating ontologies and infrastructure to make all data from its participants FAIR. This enables users not only to use existing data but also to automatically fetch data for analysis. This analysis...
For research data to be used efficiently, it must be easy to find and access. This is a requirement in all areas of science. The Data Collections Explorer, developed within NFDI4Ing for the engineering sciences, targets these needs. It is an information system that provides an overview of research data repositories, archives, databases as well as individual datasets published in the field. ...
Introduction: The National Research Data Infrastructure for Personal Health Data (NFDI4Health) aims to improve the FAIRness of health-related data from epidemiological, public health and clinical studies as well as registries and administrative health databases[1]. One key service of NFDI4Health is the German Central Health Study Hub[2] that supports a standardised publication and search...
How can a computer understand the relations of data or objects from the real world? Ontologies are semantic artifacts that capture knowledge about their domain of interest in a machine-understandable form. The main goal of developing ontologies is to formalize concepts and their relations through which humans express meaning and to use them as a communication interface to machines. Thus,...
Semantic interoperability is one of the major challenges in implementing the FAIR principles [1] for research data. This is especially relevant for interdisciplinary projects, where people from different but related disciplines may use technical terms with differing meaning. Established vocabularies and semantic standards can harmonize domain-specific language and facilitate common...
The Helmholtz Open Science Office embraces this mission (Enabling open science practices in Helmholtz!) since it was founded by the Helmholtz Association in 2005. It supports the Helmholtz Association as a service provider in shaping the cultural change towards open science. Furthermore, it promotes dialogue on open science within and beyond Helmholtz and regularly offers events on open...
Within the current project we plan to optimise data and metadata curation workflow by automating the creation of community standard metadata StationXML and include the generated PIDs as well as link them to the parent dataset DOIs. Moreover, we plan to enrich metadata with terms from standard and community specific vocabularies. Specific guidelines, describing the OBS data management...
In geodisciplines such as the cryosphere sciences, a large variety of data is available in data repositories provided on platforms such as Pangaea. In addition, many computational process models exist that capture various physical, geochemical, or biological processes at a wide range of spatial and temporal scales and provide corresponding simulation data. A natural thought is to...
The [FAIR Digital Object Lab][1] is an extendable and adjustable software stack for generic FAIR Digital Object (FAIR DO) tasks. It consists of a set of interacting components with services and tools for creation, validation, discovery, curation, and more.
Preprocessing data for research, like finding, accessing, unifying or converting, takes up to 80% of research time spans. The FAIR...
The application case for implementing and using the FAIR Digital Object (FAIR DO) concept aims to simplify usage of label information for composing Machine Learning (ML) training data.
Image data sets curated by different domain experts usually have non-identical label terms. This prevents images with similar labels from being easily assigned to the same category. Therefore, using the images...
The International Generic Sample Number (IGSN) is a unique and persistent identifier for – originally – geological samples. Recently, interest has grown to make the IGSN available for more sample types from further scientific communities from the Earth and Environment (E & E). The IGSN Metadata Schema is modular: The mandatory registration schema is complemented by the IGSN Description Schema...
Biomolecules, such as DNA and RNA, provide a wealth of information about the distribution and function of marine organisms, and biomolecular research in the marine realm is pursued across several Helmholtz Centers. Biomolecular metadata, i.e. DNA and RNA sequences and all steps involved in their creation, exhibit great internal diversity and complexity. However, high-quality (meta)data...
HELIPORT is a data management solution that aims at making the components and steps of the entire research experiment’s life cycle discoverable, accessible, interoperable and reusable according to the FAIR principles.
Among other information, HELIPORT integrates documentation, scientific workflows, and the final publication of the research results - all via already established solutions for...
The Helmholtz digital ecosystem connects diverse scientific domains with differing (domain-specific) standards and best practices for handling metadata. Ensuring interoperability within such a system, e.g. of developed tools, offered services and circulated research data, requires a semantically harmonized, machine-actionable, and coherent understanding of the relevant concepts. Further, this...
Helmholtz Imaging's mission is to unlock the potential of imaging in the Helmholtz Association. Image data provide a substantial part of data being generated in scientific research. Helmholtz Imaging is the overarching platform to better leverage and make accessible to everyone the innovative modalities, methodological richness, outstanding expertise and data treasures of the Helmholtz...
The desired Interoperability of data as outlined by the FAIR principles, requires a harmonization of data handling processes among data infrastructures. To support the adoption of agreements on such processes and thus further develop the “ROAD TO FAIR”, HMC is currently establishing a FAIR-IMP-lementation Network (F-IMP). With this communication network we encourage the data management...
In pursuit of deep and expressive semantic interoperability, the Earth and Environment Hub is adopting a three-pillared approach to develop strategically and technically aligned capacity within the Helmholtz Association and globally.
The first pillar is implementation of high-quality, future-oriented semantic solutions for Earth and environmental applications. HMC E&E personnel lead the...
PIDs (Persistent Identifiers) are a core concept at the center of FAIR data architectures such as FAIR Digital Objects. They point to a digital resource such as a publication, dataset or a set of information in a distinctive and lasting fashion and are assured to persist over longer, defined periods of time.
We looked into six established PID systems (ROR, ORCID, PIDINST, IGSN, DataCite...
Within the research project LOD-GOESS (https://lod-geoss.gitub.io ) and the Helmholtz Metadata Hub Energy we are developing a distributed data architecture for sharing and improved discovery of research data in the domain of energy systems analysis. A central element is the databus (https://databus.dbpedia.org ) which acts as a central searchable metadata catalog. Data will be annotated on the...
A central mission of HMC is to support the data producers of the Helmholtz community in making their data FAIR. Developing a sustainable strategy for doing so requires a detailed understanding of community-specific practices, strengths, and limitations related to the application of each FAIR data guideline. We have applied the FAIR Data Maturity Model, developed by the respective RDA working...
The Helmholtz Metadata Collaboration (HMC) promotes the use of metadata in Research Data Management as a means to achieving data findability, accessibility, interoperability, reusability (FAIR). These in turn enable or optimize software functionalities essential to automated research processes, such as multi-, inter- and transdisciplinary indexing and retrieval, versioning, provenance...
Metadata plays a key role in the scientific publication process. It is only through metadata and identifiers that each contribution, from research data to article publication and beyond, becomes findable, accessible, interoperable and reusable. The digitization of scholarly communication allows the creation of metadata locally or in a distributed manner, and global exchange, enabled by...
Improving research data management practices is both an organizational and a technical challenge: even in the same research field, (meta)data is often created, stored and processed in an ad-hoc manner. This results in a lack of a clear structure and standardization and makes the metadata “unFAIR”. We present two tools that assist scientists in their research workflows to enrich, structure and...
Physical samples with informative metadata are more easily discoverable, shareable, and reusable. Metadata provides the framework for consistent, systematic, and standardized collection and documentation of sample information. This poster explores practical implementation of the FAIR Principles through creation of a framework centralized around biospecimens, linked datasets, sample...
Within NFDI-MatWerk (“National Research Data Infrastructure for Material Sciences”/ “Nationale Forschungsdateninfrastruktur für Materialwissenschaften und Werkstofftechnik“), the Task Area Materials Data Infrastructure (TA-MDI) will provide tools and services to easily store, share, search, and analyze data and metadata. Such a digital materials environment will ensure data integrity,...
Researchers in the social sciences use various software for statistical analysis of rectangular, structured data . The various data formats which are only partially compatible impede data exchange and reuse. In particular, proprietary data formats endanger those in the FAIR principles enshrined demand for interoperability. The project [Open Data Format][1] aims to develop a non-proprietary...
The Open Researcher and Contributor ID [ORCID][1] strives to enable transparent and trustworthy connections between researchers, their contributions, and their affiliations by providing a unique, persistent identifier for individuals to use as they engage in research, scholarship, and innovation activities. ORCID is therefore an essential piece of the puzzle for increasing the discoverability...
Making research data reusable in an open and FAIR [1] way is part of good scientific practice and is increasingly becoming part of the scientific workflow. Where and how "FAIR" research data is published alongside a research paper, is often not tracked by research institutes. In a pilot project of the Helmholtz Metadata Collaboration (HMC) Hub Matter we developed an approach to automatically...
Researchers in many fields rely on complex data from specialized instruments and large numbers of experiments. Metadata is key to efficiently document and describe data’s essential attributes, and help to generate overviews of large datasets. Manually collecting and curating the extensive amounts of metadata required – some of which might be even inaccessible – is a major challenge. To support...
For research data to be reusable by scientists or machines, the data and associated meta-data should comply with the so-called "FAIR principles", i.e. it should be findable, accessible, interoperable, and reusable [1]. To realize this, is not a straightforward task, as researchers do not know how FAIR or un-fair their data actually is and how to improve their FAIRness. A quantitative measure,...
[Research Object Crate][1] (RO-Crate) is an open, community driven data package specification to describe all kinds of file-based data, as well as entities outside the package. In order to do so, it uses the widespread JSON-format, representing Linked Data (JSON-LD), allowing to link to external information. This makes the format flexible and machine-readable. These packages are being referred...
riaf is a repository infrastructure to accommodate files. It enables to hold the data with the FAIR principles (see also fair-principles).
riaf is designed to enable provenance and reproducibility of the research data in the early part of the data life cycle, i. e....
This poster presents the new HMC project Metamorphoses (“Metadata for the merging of diverse atmospheric data on common subspaces”). The project will develop enhanced standards for storage efficient decomposed arrays and tools for an automated generation of standardised Lagrange trajectory data files thus enabling an optimised and efficient synergetic merging of large remote sensing data sets....
Simulation is an essential pillar of knowledge generation in science. The numerical models used to describe, predict, and understand real-world systems are typically complex. Consequently, applying these models by means of simulation often poses high demands on computational resources, and requires high-performance computing (HPC) or other dedicated hardware architectures. Metadata describing...
Scientific image data sets can be continuously enriched by labels describing new features which are relevant for some specific task. This process can be automated by means of Machine Learning (ML) techniques. Although such an approach shows clear advantages, especially when it is applied to large datasets, it also poses an important challenge:
Relabeling image data sets curated by different...