Speaker
Description
While research data management systems (RDMSs) provide many benefits for scientists, data integration is still one of the major bottlenecks for the adoption of an RDMS. Especially the omnipresent dependency on file-based digital workflows and the strong heterogeneity of file and data layouts pose important challenges. We have developed a crawler-based concept [1] that allows us to combine file-based digital workflows with RMDS-software in a way that they can be used simultaneously. Furthermore, the concept includes a flexible configuration of data integration procedures in a YAML-based format that facilitates its application to different use cases. We demonstrate how to apply these concepts practically using the LinkAhead-crawler framework (CaosDB was recently renamed to LinkAhead). The software is published as Open Source software under AGPLv3 and can be accessed online (https://gitlab.com/linkahead/linkahead-crawler).
[1] Tom Wörden, H.; Spreckelsen, F.; Luther, S.; Parlitz, U.; Schlemmer, A. Mapping hierarchical file structures to semantic data models for efficient data integration into research data management systems. Preprints 2023, 2023081170. https://doi.org/10.20944/preprints202308.1170.v1