25 February 2025 to 1 March 2025
Building 30.95
Europe/Berlin timezone

Keeping it REAL

26 Feb 2025, 18:00
2h
Audimax Foyer (Building 30.95)

Audimax Foyer

Building 30.95

Str. am Forum 1, 76131 Karlsruhe
Poster software sustainability Poster and Demo Session together with Reception

Speakers

Lutz Krister Schubert (University of Cologne) Florian Thiery (Research Squirrel Engineers Network, CAA e.V.)

Description

The relevance of Open Science and Open Data is becoming increasingly obvious in modern day publications. Frequently, scientists write their own analysis code, as the complexity of analysis increases and the combination of methods become more relevant – from code conversion, to measuring and comparing. These functions and methods are not stable, are subject to change, are constrained to the use case and data used etc.

Opening and maintaining data already poses a substantial number of issues, including versioning, provenance, format, and generator specific constraints, such as precision, resolution etc. These problems intensify as we regard code that generates such data. In this presentation we will talk about the problems associated with maintaining research code.

As such, for example, code executability is even more difficult to maintain as data readability. This is mostly due to the strong dependencies of code to libraries, drivers, hardware and operating systems. All of these dependencies are subject to frequent changes, which may cause the code to not execute properly anymore or – in the worst case – still execute but deliver different results. Maintaining the code is effort-intensive and therefore basically impossible in the context of a research publication. Obviously, if the code is adapted, we need to maintain all previous versions and refer to the right versions used in the publication to ensure that in principle the same results can be reproduced should a divergence arise as a consequence of maintenance.

Due to the specificity of the code, i.e. it being originally developed for a very specific use case and data format, it requires even more effort to adapt that given code to another context, if e.g. the data format or resolution changes, even if the type of analysis and the research question remains the same. Where possible, therefore, the algorithm behind the analysis code may be of more importance than the code itself, given that it is numerically correct. As an implication, implementations may diverge from the numerical results due to the platform accuracy – ideally only minimally, however. As noted, though, this is not appropriate for too complex code where the algorithm would be too difficult to represent and explain, or for AI based and related methods that depend additional data, aka learning context. Implicitly, such methods would have to be treated differently.

With respect to ensuring that data is not only FAIR, but also reproducible under any circumstances, we follow the suggestion that code must be treated in the same fashion, by making sure that all algorithmic processes published are
- Reproducible in the sense that the results can be achieved again with the same process and context
- Executable at any point in time (though not necessarily on any machine)
- Attributable to the data and author at the stage of publication and
- Literal in so far as that the algorithm is a sound and correct representation of the mathematical methods to be applied.

I want to participate in the youngRSE prize no

Primary authors

Lutz Krister Schubert (University of Cologne) Florian Thiery (Research Squirrel Engineers Network, CAA e.V.)

Presentation materials