Speaker
Description
Rapid and precise knowledge retrieval is essential to support research in exact sciences like material science, thus optimising time management and enhancing research efficiency. Having a database of over 2,500 materials science research papers, an automated method for reliably and effectively accessing and querying this repository is necessary.
Here we show, a Retrieval-Augmented Generation (RAG) application which can be used to query this database and provide the output in form of an answer in natural language. The application features a top-performing retriever sourced from the MTEB leaderboard for retrievers in Hugging Face, further finetuned to gain domain knowledge with GPL algorithm using material science literature. The generation commonent supports GGUF models via llama.cpp and integrated Hugging Face-compatible models including Meta’s Llama-2-7b-chat. The Haystack framework is used to build strong pipelines for query handling, while for effective PDF parsing the system uses Unstructured.io to guarantee thorough data extraction. The application's three main features include searching the database of publications, querying documents that users have uploaded on the application, and performing web search by extending queries to Google Scholar. Users can engage with the application via a web interface or command-line tools.
By employing Retrieval-Augmented Generation, the application enables users to query the database in natural language and obtain factual, focused and contextually grounded responses based on original study papers. It speeds the process of finding information by quickly pinpointing to the context the user is looking for, thus greatly accelerating the knowledge retrieval process as a whole. This method increases research productivity, helps researchers save time and facilitates more efficient knowledge discovery in the material science domain.
I want to participate in the youngRSE prize | no |
---|