With new specialisations such as Data Science driven by digitisation, efficiency potentials of a digital transformation are raised in both empirical research and data governance processes. Here, one challenge is to establish open and interoperable datasets, recognising the FAIR criteria (cf. Wilkinson et al., 2016) as a standard of that process. Data – as well as metadata – should comply to...
In an ever-changing world, field surveys, inventories and monitoring data are essential for prediction of biodiversity responses to global drivers such as land use and climate change. This knowledge provides the basis for appropriate management. However, field biodiversity data collected across terrestrial, freshwater and marine realms are highly complex and heterogeneous. The successful...
The Collaborative Research Centre AquaDiva is a large collaborative project spanning a variety of domains, such as biology, geology, chemistry and computer science with the common goal to better understand the Earth’s critical zone, in particular, how environmental conditions and surface properties shape the structure, properties, and functions of the subsurface. Within AquaDiva large volumes...
Software as an important method and output of research should follow the RDA "FAIR for Research Software Principles". In practice, this means that research software, whether open, inner or closed source, should be published with rich metadata to enable FAIR4RS. For research software practitioners, this currently often means following an arduous and mostly manual process of software...
In autumn 2021, the Helmholtz Metadata Collaboration (HMC) concluded its first HMC Community Survey to get in touch with Helmholtz's research communities. The survey aimed at characterizing the community-specific research data management and data publication practices as well as related gaps and needs expressed by Helmholtz's research communities. For this purpose, we developed a question...
How can a computer understand the relations of data or objects from the real world? Ontologies are semantic artifacts that capture knowledge about their domain of interest in a machine-understandable form. The main goal of developing ontologies is to formalize concepts and their relations through which humans express meaning and to use them as a communication interface to machines. Thus,...
Semantic interoperability is one of the major challenges in implementing the FAIR principles [1] for research data. This is especially relevant for interdisciplinary projects, where people from different but related disciplines may use technical terms with differing meaning. Established vocabularies and semantic standards can harmonize domain-specific language and facilitate common...
In geodisciplines such as the cryosphere sciences, a large variety of data is available in data repositories provided on platforms such as Pangaea. In addition, many computational process models exist that capture various physical, geochemical, or biological processes at a wide range of spatial and temporal scales and provide corresponding simulation data. A natural thought is to...
The application case for implementing and using the FAIR Digital Object (FAIR DO) concept aims to simplify usage of label information for composing Machine Learning (ML) training data.
Image data sets curated by different domain experts usually have non-identical label terms. This prevents images with similar labels from being easily assigned to the same category. Therefore, using the images...
The International Generic Sample Number (IGSN) is a unique and persistent identifier for – originally – geological samples. Recently, interest has grown to make the IGSN available for more sample types from further scientific communities from the Earth and Environment (E & E). The IGSN Metadata Schema is modular: The mandatory registration schema is complemented by the IGSN Description Schema...
Get your hands dirty with semi-structured metadata in HMC’s remote training course “Fundamentals of scientific metadata: why context matters”!
Have you ever struggled to make sense of research data provided by a collaborator - or even to make sense of your own data 5 months after publication? Do you see difficulties in meeting data description requirements of your funding agency? Do you...
Biomolecules, such as DNA and RNA, provide a wealth of information about the distribution and function of marine organisms, and biomolecular research in the marine realm is pursued across several Helmholtz Centers. Biomolecular metadata, i.e. DNA and RNA sequences and all steps involved in their creation, exhibit great internal diversity and complexity. However, high-quality (meta)data...
The Helmholtz Metadata Collaboration (HMC) promotes the use of metadata in Research Data Management as a means to achieving data findability, accessibility, interoperability, reusability (FAIR). These in turn enable or optimize software functionalities essential to automated research processes, such as multi-, inter- and transdisciplinary indexing and retrieval, versioning, provenance...
One of the prerequisites for FAIR data publication is the use of FAIR vocabularies. Currently, tools for the collaborative composition of such vocabularies are missing. For this reason, a universal manual and software for user-friendly vocabulary assembly is being composed in the HMC-funded MetaCook project. The project includes 4 separate test cases from 4 labs across KIT and Hereon, which...
Improving research data management practices is both an organizational and a technical challenge: even in the same research field, (meta)data is often created, stored and processed in an ad-hoc manner. This results in a lack of a clear structure and standardization and makes the metadata “unFAIR”. We present two tools that assist scientists in their research workflows to enrich, structure and...
Making research data reusable in an open and FAIR [1] way is part of good scientific practice and is increasingly becoming part of the scientific workflow. Where and how "FAIR" research data is published alongside a research paper, is often not tracked by research institutes. In a pilot project of the Helmholtz Metadata Collaboration (HMC) Hub Matter we developed an approach to automatically...
Researchers in many fields rely on complex data from specialized instruments and large numbers of experiments. Metadata is key to efficiently document and describe data’s essential attributes, and help to generate overviews of large datasets. Manually collecting and curating the extensive amounts of metadata required – some of which might be even inaccessible – is a major challenge. To support...
riaf is a repository infrastructure to accommodate files. It enables to hold the data with the FAIR principles (see also fair-principles).
riaf is designed to enable provenance and reproducibility of the research data in the early part of the data life cycle, i. e....
This poster presents the new HMC project Metamorphoses (“Metadata for the merging of diverse atmospheric data on common subspaces”). The project will develop enhanced standards for storage efficient decomposed arrays and tools for an automated generation of standardised Lagrange trajectory data files thus enabling an optimised and efficient synergetic merging of large remote sensing data sets....
Simulation is an essential pillar of knowledge generation in science. The numerical models used to describe, predict, and understand real-world systems are typically complex. Consequently, applying these models by means of simulation often poses high demands on computational resources, and requires high-performance computing (HPC) or other dedicated hardware architectures. Metadata describing...
Scientific image data sets can be continuously enriched by labels describing new features which are relevant for some specific task. This process can be automated by means of Machine Learning (ML) techniques. Although such an approach shows clear advantages, especially when it is applied to large datasets, it also poses an important challenge:
Relabeling image data sets curated by different...