deRSE25 and SE25 Timetables

Name: deRSE25 and SE25 Timetables
Start: 2025-02-25T09:00:00+01:00
End: 2025-03-01T15:00:00+01:00
Location: Building 30.95

25 February 2025 to 1 March 2025

Building 30.95

Europe/Berlin timezone

Contact

Fine-grained exploration of the reproducibility of research-related Jupyter notebooks at scale

26 Feb 2025, 16:00

20m

Room 206 (Building 30.70)

Room 206

Building 30.70

Straße am Forum 6, 76131

Talk (15min + 5min) computational reproducibility Reproducibility and Discovery of Research Software

Dr Daniel Mietchen (FIZ Karlsruhe — Leibniz Institute for Information Infrastructure, Germany)

Jupyter notebooks have revolutionized the way researchers share code, results, and documentation, all within an interactive environment, promising to make science more transparent and reproducible. In research contexts, Jupyter notebooks often coexist with other software and various resources such as data, instruments, and mathematical models, all of which may affect scientific reproducibility. Here, we present a study that analyzed the computational reproducibility of 27,271 Jupyter notebooks from 2,660 GitHub repositories associated with 3,467 biomedical publications (https://doi.org/10.1093/gigascience/giad113). The resulting reproducibility data were loaded into a knowledge graph --FAIR Jupyter-- that allows for a highly granular exploration and interrogation.

The FAIR Jupyter graph is accessible via https://w3id.org/fairjupyter and described in a preprint available at https://doi.org/10.48550/arXiv.2404.12935 . It contains rich metadata about the publications, associated GitHub repositories and Jupyter notebooks, and the notebooks' dependencies and reproducibility. Through a public SPARQL endpoint, it enables detailed data exploration and analysis by way of queries that can be tailored to specific use cases. Such queries may provide details about any of the variables from the original dataset, highlight relationships between them or combine some of the graph's content with materials from corresponding external resources.

We provide a collection of example queries addressing a range of use cases in research software engineering and education. We also outline how sets of such queries can be used to profile specific content types, either individually or by class. We conclude by discussing how such a semantically enhanced sharing of complex datasets can both enhance their FAIRness i.e., their findability, accessibility, interoperability, and reusability, and help identify and communicate best practices, particularly with regards to the quality, standardization and reproducibility of research-related software and scripts.

Dr Daniel Mietchen (FIZ Karlsruhe — Leibniz Institute for Information Infrastructure, Germany)

Sheeba Samuel

deRSE25 talk on Fine-grained exploration of the reproducibility of research-related Jupyter notebooks at scale.odp

deRSE25 talk on Fine-grained exploration of the reproducibility of research-related Jupyter notebooks at scale.pdf

deRSE25 talk on Fine-grained exploration of the reproducibility of research-related Jupyter notebooks at scale.pptx

deRSE25 and SE25 Timetables

Contact

Fine-grained exploration of the reproducibility of research-related Jupyter notebooks at scale

Room 206

Building 30.70

Speaker

Description

Primary author

Co-author

Presentation materials

Choose timezone

deRSE25 and SE25 Timetables

Contact

Speaker

Description

Primary author

Co-author

Presentation materials