Is Your Data Ready for AI? A Practitioner‘s Perspective

30 Sept 2025, 10:10
15m
Zoom (online)

Zoom

online

Speaker

Stefan Kesselheim (Forschungszentrum Jülich)

Description

The recent successes of Artificial Intelligence and Machine Learning have been possible only as a consequence of published datasets. Benchmark datasets such as ImageNet (Deng, J. et al., 2009.) have been developed as tools to measure the progress in the field, and have become the quasi-standard. Recently, extremely large data collections, such as The Pile (Gao et al., 2020) and Fineweb2 (Penedo et al., 2025) are the fertile ground for the development of Open Source Large Language Models. Scientific data sets such as the Protein Data Bank are the basis for breakthroughs such as AlphaFold (Jumper et al, 2021). Highly visible, high quality dataset in the right context can contribute to significantly advance a scientific field. In my talk, I‘ll discuss the success factors for datasets and different strategies to make them visible for AI method experts. Breakthroughs in AI are steered by the challenges that the scientists are trying to solve. Help them find the most exciting ones.

Presentation materials

There are no materials yet.