Munich Health Foundation Model Symposium

Name: Munich Health Foundation Model Symposium
Start: 2024-04-10T10:00:00+02:00
End: 2024-04-10T19:00:00+02:00
Location: Helmholtz Munich Campus

10 April 2024

Helmholtz Munich Campus

Europe/Berlin timezone

Contact

mathieu.seyfrid@helmholtz-muenchen.de

Examining the effectiveness of foundation models on human 3’UTR sequences

Not scheduled

Auditorium, Building 23 (Helmholtz Munich Campus)

Auditorium, Building 23

Helmholtz Munich Campus

Ingolstädter Landstraße 1 · D-85764 Neuherberg

Talk Poster Break + Posters session

Foundation models like DNABERT and Nucleotide Transformer have recently gained a lot of popularity in the field of DNA research. Adopted from Natural Language Processing, these models are trained in a self-supervised manner on vast amounts of genomic data. Once trained, foundation models offer applications for various downstream tasks, including promoter and enhancer prediction, prediction of epigenetic markers and splice sites, functional variant prioritization. However, genomic language models are typically trained and evaluated on entire genomes, ignoring genome partitioning into distinct functional regions.
In our work, we develop a set of 3’UTR-specific tasks to study the performance of language models on human 3’UTR sequences. These tasks include identification of binding motifs of RNA binding proteins, detection of functional genetic variants, prediction of expression levels in massively parallel reporter assays, and estimation of mRNA half-life. In total, we test three established genome-wide foundation models as well as five transformer models that we specifically train on 3’UTR sequences from 241 mammalian species.
We demonstrate that the models specifically trained on 3’UTR sequences exhibit superior performance in three out of four downstream tasks compared to their genome-wide counterparts. These findings emphasize the significance of accounting for genome partitioning into distinct functional regions while training and evaluating foundation models. We also note that the proposed set of 3’UTR-specific tasks may serve as a benchmark for assessing the performance of future models.
The results of our work are currently available as a bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2024.02.09.579631v1

Dr Matthias Heinig (Helmholtz Zentrum München, Institute of Computational Biology) Dr Sergey Vilov (Helmholtz Zentrum München, Institute of Computational Biology)

There are no materials yet.

Munich Health Foundation Model Symposium

Contact

Examining the effectiveness of foundation models on human 3’UTR sequences

Auditorium, Building 23

Helmholtz Munich Campus

Description

Primary authors

Presentation materials

Choose timezone

Munich Health Foundation Model Symposium

Contact

Description

Primary authors

Presentation materials