20 February 2024
Forschungszentrum Jülich
Europe/Berlin timezone

TransMEP: Transfer learning on large protein language models to predict mutation effects of proteins from a small known dataset

20 Feb 2024, 10:45
1h
Lecture hall of the central library (Forschungszentrum Jülich)

Lecture hall of the central library

Forschungszentrum Jülich

Forschungszentrum Jülich GmbH Wilhelm-Johnen-Straße 52428, Jülich Germany

Speaker

Birgit Strodel (Forschungszentrum Jülich)

Description

Machine learning-guided optimization has become a driving force for recent improvements in protein engineering. In addition, new protein language models are learning the grammar of evolutionarily occurring sequences at large scales. This work combines both approaches to make predictions about mutational effects that support protein engineering. To this end, an easy-to-use software tool called TransMEP is developed using transfer learning by feature extraction with Gaussian process regression. A large collection of datasets is used to evaluate its quality, which scales with the size of the training set, and to show its improvements over previous fine-tuning approaches. Wet-lab studies are simulated to evaluate the use of mutation effect prediction models for protein engineering. This showed that TransMEP finds the best performing mutants with a limited study budget by considering the trade-off between exploration and exploitation.

Authors Hoffbauer, Tilman [a,b]; Strodel, Birgit [a,c]
Affiliation [a] Institute of Biological Information Processing: Structural Biochemistry (IBI-7), Forschungszentrum Jülich, 52428 Jülich, Germany; [b] RWTH Aachen University, 52062 Aachen, Germany; [c] Institute of Theoretical and Computational Chemistry, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
Consent Yes

Primary author

Birgit Strodel (Forschungszentrum Jülich)

Presentation materials

There are no materials yet.