Speaker
Description
Introduction
Segmenting cellular structures in electron microscopy (EM) images is crucial for studying the morphology of neurons and glial cells in both healthy and diseased brain tissues. Traditionally, this task has relied on manual annotation, requiring neuroanatomy experts to examine images slice by slice and delineate each structure individually. However, manual annotation is highly labor-intensive and time-consuming. To address this, deep learning methods such as convolutional neural networks (CNNs) have been adopted for EM image segmentation. Nevertheless, CNNs are often criticized for their reliance on local feature extraction, which limits their ability to capture long-range dependencies and global context within an image. Recently, transformer-based models with self-attention mechanisms have emerged as powerful alternatives, enabling more effective integration of both local and global information. Building on this progress, the Segment Anything Model (SAM) has shown strong performance in natural image segmentation. In this work, we investigate the application of SAM to microscopy images and assess its potential to enhance segmentation accuracy and efficiency in neuroanatomical studies.
Materials & Methods
This study builds on two pioneering approaches: the SAM and Segment Anything for Microscopy (Micro-SAM). We fine-tuned and evaluated the models from Micro_SAM on in-house serial block-face scanning electron microscopy datasets with a cutting interval of 40 nm. Dataset A, from the CA1 region of the hippocampus of a healthy rat, contained 1044 slices and was used as a control to characterize normal dendritic structures. Dataset B, with 698 slices, was collected from the same region after pilocarpine-induced status epilepticus in a rat. Dataset C, comprising 697 slices, was derived from a cortical layer II biopsy of a patient with idiopathic normal pressure hydrocephalus obtained during shunt surgery. The pixel size was 15 × 15 nm² for datasets A and B, and 10 × 10 nm² for dataset C. Model training and internal evaluation were performed on dataset A, while external evaluation was conducted on datasets B and C. After training, the model performance was assessed using the object-level error metric [1].
Results
By deriving prompts from ground-truth masks, and using the mask quality of bounding-box prompts as a benchmark, our fine-tuned ViT-B and ViT-L models exhibited approximately 19.1% and 20.8% improvements over the original SAM, and 150.1% and 181.9% improvements over Micro SAM models, respectively. This is because two Micro_SAM models used in this study were trained for organelles. In addition, a user study indicated that our fine-tuned model enables human annotators to generate segmentations more consistent with ground truth while requiring less annotation time.
Discussion
Our previous study showcased that automatic inference produced inferior results compared to interactive prompting, largely due to the limitations of grid-point based auto-prompting. To overcome this, we integrated the object detection model You Only Look Once (YOLO), which generates bounding-box prompts as inputs. This strategy compensates for the less competitive results of automatic inference, while also enabling the generation of additional pseudo-masks to enhance performance in subsequent training stages. For the next stage, we aim to generate more masks and construct 3-dimensional dendrite connectome maps.