Speaker
Description
In today's scientific landscape, especially in robust data-driven research, metadata plays a pivotal role. Referred to as "data about data," metadata is essential for augmenting the FAIRness (Findability, Accessibility, Interoperability, and Reproducibility) of digital information. It provides crucial context and structure to raw data, encompassing details such as origin, format, and provenance. Moreover, it contributes to enhancing research reproducibility by documenting methodology, parameters, and conditions, promoting transparency and validation. In data and/or software repositories, metadata acts as the guardian of data integrity and accessibility. A well-curated repository relies on robust metadata for efficient categorization, indexing, and retrieval.
Despite its significance, researchers often face challenges in providing metadata, primarily due to time constraints, a lack of guidelines, and the perceived complexity involved. In many instances, compiling metadata relies heavily on manual procedures. Researchers must gather metadata at critical stages of their research—either concurrently with their ongoing work (which may disrupt the researcher's workflow) or by retrospectively revisiting the entire research process (which is cumbersome). This also demands a comprehensive understanding of potential metadata fields, which may vary from one target repository to another. Identifying the relevance of a piece of information in terms of metadata is challenging, adding an additional layer of complexity and hindering the achievement of comprehensive and varied metadata for publication. Consequently, there is a compelling need for the development and implementation of an automated metadata collection process to simplify this intricate facet of research data management. Effectively addressing these challenges requires the creation of a specialized tool to automate the gathering of contextual information—an essential step in streamlining the complex task of metadata provision.
We are introducing Harvester-Curator, a tool designed to elevate metadata provision in data and/or software repositories. In the first phase, Harvester-Curator acts as a scanner, navigating through user code and/or data repositories to identify suitable parsers for different file types. It collects metadata from each of the files by applying corresponding parsers and then compiles this information into a structured JSON file, providing researchers with a seamless and automated solution for metadata collection. Moving to the second phase, Harvester-Curator transforms into a curator, leveraging the harvested metadata to populate metadata fields in a target repository. By automating this process, it not only relieves researchers of the manual burden but also ensures the accuracy and comprehensiveness of the metadata. Beyond its role in streamlining the intricate task of metadata collection, this tool contributes to the broader objective of elevating data accessibility and interoperability within repositories.