ObjectiveData science and machine learning methodologies are essential to address complex scientific challenges across various domains. These advancements generate numerous research assets such as datasets, software t...
详细信息
ObjectiveData science and machine learning methodologies are essential to address complex scientific challenges across various domains. These advancements generate numerous research assets such as datasets, software tools, and workflows, which are shared within the open science community. Concurrently, computationalnotebook environments like Jupyter notebook, along with platforms like Google Colab and Kaggle Kernel, facilitate data science research and machine learning workflows, transforming data analysis, model development, and knowledge sharing processes. The proliferation of computationalnotebooks has further enriched the pool of valuable research assets. Researchers frequently require efficient access to these assets to advance their work, yet current tools often require navigating multiple websites and portals, leading to inefficiency and information overload. The challenge is compounded when relying on general web search engines that might not adequately highlight niche scientific *** address these issues, we propose the development of an innovative Multiple Research Asset search (MRAS) system designed to index diverse research assets from heterogeneous sources, offering a unified search interface for researchers. Our system aims to significantly improve the discovery of computationalnotebooks and datasets, facilitating data-driven *** developed a pipeline for data extraction and indexing, reviewed and applied state-of-the-art ranking algorithms, enhanced indexing documents with content analysis, and created a Jupyter extension for asset discovery within the working *** work is structured to detail our approach, literature review, system development, empirical validation, results, and conclusions, illustrating the potential impact of our MRAS system on scientific research efficiency.
computationalnotebook environments have drawn broad attention in data-centric research applications, e.g., virtual research environment, for exploratory data analysis and algorithm prototyping. Vanilla computational ...
详细信息
ISBN:
(纸本)9781450394161
computationalnotebook environments have drawn broad attention in data-centric research applications, e.g., virtual research environment, for exploratory data analysis and algorithm prototyping. Vanilla computational notebook search solutions have been proposed but they do not pay much attention to the information needs of scientific researchers. Previous studies either treat computational notebook search as a code search problem or focus on content-based computational notebook search. The queries being considered are neither research-concerning nor diversified whereas researchers' information needs are highly specialized and complex. Moreover, relevance evaluation for computationalnotebooks is tricky and unreliable since computationalnotebooks contain fragments of text and code and are usually poorly organized. To solve the above challenges, we propose a computational notebook search system for virtual research environment (VRE), i.e., CNSVRE, with scientific query reformulation and computationalnotebook summarization. We conduct a user study to demonstrate the effectiveness, efficiency, and satisfaction with the system.
暂无评论