Federated learning-based Named Entity Recognition (FNER) has attracted widespread attention through decentralized training on local clients. However, most FNER models assume that entity types are pre-fixed, so in prac...
详细信息
Query optimization is a critical task in database systems, focused on determining the most efficient way to execute a query from an enormous set of possible strategies. Traditional approaches rely on heuristic search ...
详细信息
Index recommendation is essential for improving query performance in database management systems (DBMSs) through creating an optimal set of indexes under specific constraints. Traditional methods, such as heuristic an...
详细信息
Text-to-SQL, the task of translating natural language questions into SQL queries, plays a crucial role in enabling non-experts to interact with databases. While recent advancements in large language models (LLMs) have...
详细信息
Existing low-rank adaptation (LoRA) methods face challenges on sparse large language models (LLMs) due to the inability to maintain sparsity. Recent works introduced methods that maintain sparsity by augmenting LoRA t...
详细信息
Applying large language models (LLMs) to academic API usage shows promise in reducing researchers' efforts to seek academic information. However, current LLM methods for using APIs struggle with the complex API co...
详细信息
ISBN:
(纸本)9798400712456
Applying large language models (LLMs) to academic API usage shows promise in reducing researchers' efforts to seek academic information. However, current LLM methods for using APIs struggle with the complex API coupling commonly encountered in academic queries. To address this, we introduce SoAy, a solution-based LLM methodology for academic information seeking. SoAy enables LLMs to generate code for invoking APIs, guided by a pre-constructed API calling sequence referred to as a solution. This solution simplifies the model's understanding of complex API relationships, while the generated code enhances reasoning efficiency. LLMs are aligned with this solution-oriented, code-based reasoning method by automatically enumerating valid API coupling sequences and transforming them into queries and executable *** evaluate SoAy, we introduce SoAyBench, an evaluation benchmark accompanied by SoAyEval, built upon a cloned environment of APIs from AMiner. Experimental results demonstrate a 34.58-75.99% performance improvement compared to state-of-the-art LLM API-based baselines. All datasets, codes, tuned models, and deployed online services are publicly accessible at https://***/RUCKBReasoning/SoAy.
作者:
王珊杜小勇孟小峰陈红School of Information
Renmin University of China MOE Key Lab of Data Engineering and Knowledge Engineering Beijing 100872 P.R. China
database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off la...
详细信息
database system is the infrastructure of the modern information system. The R&D in the database system and its technologies is one of the important research topics in the field. The database R&D in China took off later but it moves along by giant steps. This report presents the achievements Renmin University of China (RUC) has made in the past 25 years and at the same time addresses some of the research projects we, RUC, are currently working on. The National Natural Science Foundation of China supports and initiates most of our research projects and these successfully conducted projects have produced fruitful results.
Dear editor,Frequent itemset mining (FIM) is important in many data mining applications [1], such as web log mining and trend analysis. However, if the data are sensitive (e.g., web browsing history), directly releasi...
详细信息
Dear editor,Frequent itemset mining (FIM) is important in many data mining applications [1], such as web log mining and trend analysis. However, if the data are sensitive (e.g., web browsing history), directly releasing frequent itemsets and their support may breach user privacy. The protection of user privacy while obtaining statistical information is im-
Speaker diarization is typically considered as a discriminative task, using discriminative approaches to produce fixed diarization results. In this paper, we explore for the first time the use of neural network-based ...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Speaker diarization is typically considered as a discriminative task, using discriminative approaches to produce fixed diarization results. In this paper, we explore for the first time the use of neural network-based generative methods for speaker diarization. We implement a Flow-Matching (FM) based generative algorithm within the sequenceto-sequence target speaker voice activity detection (Seq2Seq-TSVAD) diarization system. Our experiments reveal that applying the generative method directly to the original binary label sequence space of the TS-VAD output is ineffective. To address this issue, we propose mapping the binary label sequence into a dense latent space before applying the generative algorithm, and our proposed Flow-TSVAD method can significantly outperform the traditional Seq2Seq-TSVAD system. Additionally, we observe that the FM algorithm converges rapidly during the inference stage, only requiring two inference steps to achieve promising results. Moreover, as a generative model, Flow-TSVAD allows for sampling different diarization results by running the model multiple times, so the ensemble system combining the results from various sampling instances can further boost the diarization performance.
Monitoring on data streams is an efficient method of acquiring the characters of data stream. However the available resources for each data stream are limited, so the problem of how to use the limited resources to pro...
详细信息
Monitoring on data streams is an efficient method of acquiring the characters of data stream. However the available resources for each data stream are limited, so the problem of how to use the limited resources to process infinite data stream is an open challenging problem. In this paper, we adopt the wavelet and sliding window methods to design a multi-resolution summarization data structure, the Multi-Resolution Summarization Tree (MRST) which can be updated incrementally with the incoming data and can support point queries, range queries, multi-point queries and keep the precision of queries. We use both synthetic data and real-world data to evaluate our algorithm. The results of experiment indicate that the efficiency of query and the adaptability of MRST have exceeded the current algorithm, at the same time the realization of it is simpler than others.
暂无评论