Speakers
Description
The increasing demand for accessible Natural Language Processing (NLP) tools, as well as the continuing growth of data and the associated processing time, in the Digital Humanities (DH) community has highlighted the need for platforms that lower the barrier to advanced textual analysis across various research fields in the humanities. MONAPipe, short for “Modes of Narration and Attribution Pipeline”, meets this need by offering a modular, open-source NLP pipeline that provides end-to-end integration of community-developed classifiers. MONAPipe was originally created in the project group MONA with a particular focus on Computational Literary Studies (CLS) and is now being further developed within Text+ as part of the German National Research Data Infrastructure (NFDI) for the needs of a broad user group in the humanities.
MONAPipe is distributed as a Python library based on the NLP framework spaCy. Based on spaCy’s capability to include custom components, MONAPipe integrates its own components and additionally allows them to have several implementations, e.g. the component speech tagger has a neural and a rule-based implementation (see Brunner et al. (2020) and Dönicke et al. (2022)). Designed to make specific community-driven NLP components accessible, MONAPipe provides an intuitive, Python-based framework that fosters data literacy, helping DH researchers develop a deeper understanding of text analysis requiring only a basic knowledge of Python. Additionally, we invite developers to participate by integrating their own components or implementations. For both applications, using and developing MONAPipe, a comprehensive documentation is provided.
MONAPipe incorporates larger resources (e.g. custom models) from an external repository. The software leverages GRO.data, a long-term archive based on dataverse that provides versioning and persistent identifiers. Developers are free to use other common data repositories such as Hugging Face Hub.
MONAPipe uses a containerisation strategy for managing the highly specific requirements of NLP components. NLP tools often have strict, conflicting library requirements, and dependency issues can disrupt workflows. Currently, we encapsule specific implementations within Docker containers to isolate dependencies in self-contained environments, preventing compatibility conflicts and ensuring stable, reproducible operations. For users, these containers run locally, provide their results via REST API interfaces and integrate them into MONAPipe.
In addition to local container usage, MONAPipe will offer online APIs through KISSKI Services. These APIs, running on the HPC cluster at the Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG), will provide scalable, high-performance access to MONAPipe’s components and implementations, enabling users to leverage powerful computational resources without managing containers locally. This combination of local and HPC-based access offers flexibility and ease of use for diverse research needs.
I want to participate in the youngRSE prize | no |
---|