Description
Histopathology plays a central role in clinical medicine for tissue-based diagnostics and in biomedical research as a basis for understanding cellular and molecular mechanisms of disease. The microscopic evaluation of morphological information is a key component of computational pathology. The end-to-end whole slide image representation learning imposes a great challenge to the field of computational pathology with enormous sizes (i.e., ranging from 100M pixels to 10G pixels) for both computation and memory. In this work, we train an efficient Vision Transformer for gigapixel whole slide images in an end-to-end manner for cancer subtyping classification task. The model directly takes the whole slide image as a sequence of millions of patches and generates the representations to capture both short-range and long-range dependencies. Experimental results demonstrate that the proposed model effectively encodes gigapixel images and gives better results on cancer subtyping task with previous methods.