Description
Modeling the joint distribution of the data samples and their properties promises to excel both in new data generation and property prediction, with capabilities reaching beyond separate generative or predictive models. However, training such joint models presents daunting challenges. Here, we propose to combine a transformer decoder with a transformer encoder in a single model that at the same time is able to generate new molecules and predict their target properties. We successfully blend together the generative and the predictive functionality due to a new training procedure. We show that our single model outperforms or matches both state-of-the-art molecule generation and property prediction models. Additionally, we show the benefits of joint modeling in downstream tasks such as predicting properties of newly sampled molecules and conditional sampling, as well as generalization to unseen data.