Description
Machine Learning-based phenotype classification of zebrafish embryos enables fast and reproducible assessments for toxicity evaluation. However, acquiring large amounts of labeled data can be expensive, time-consuming and requires biological expertise. Based on the publicly available EmbryoNet dataset, we explore the impact that less available labeled data has on the performance of machine learning models and how leveraging unlabeled data using self-supervised pre-training can help in reducing the loss of performance. Furthermore, we investigate different data augmentation strategies to improve the visual representations obtained from self-supervised methods. We expect that the presented method will improve results for a wide range of tasks involving zebrafish embryos, especially when labeled data is scarce.