检索结果-内蒙古大学图书馆

2012 Cross Language Evaluation Forum Conference, CLEF 2012

作者： Wang, Qiuyue Kang, Jinglin School of Information Renmin University of China Beijing100872 China Key Lab of Data Engineering and Knowledge Engineering MOE Beijing100872 China

We report our experiment results on the INEX 2012 Linked data Track. We participated in the ad hoc and jeopardy tasks. As the new data collection on INEX 2012 Linked data Track features a combination of unstructured and structured data, our first attempt is to investigate different strategies of combining the retrievals over structured and unstructured data, and compare the combined approaches with the traditional unstructured ones. In this paper, we discussed three types of combination strategies and we experimented two of them on the track. The experiment results show that.

关键词： Linked data

来源：评论

学校读者我要写书评

暂无评论

Weak Multi-Label data Stream Classification Under Distribution Changes in Labels

引用

IEEE Transactions on Big data 2024年第3期11卷 1369-1380页

作者： Zou, Yizhang Hu, Xuegang Li, Peipei Hu, Jun Hefei University of Technology Key Laboratory of Knowledge Engineering with Big Data Ministry of Education School of Computer Science and Information Engineering Hefei China National University of Singapore School of Computing Singapore

Multi-label stream classification aims to address the challenge of dynamically assigning multiple labels to sequentially-arrived instances. In real situations, only partial labels of instances can be observed due to the expensive human annotations, and the problem of label distribution changes arises from multiple labels in a streaming mode, but few existing works jointly consider such challenges. Motivated by this, we propose the problem of weak multi-label stream classification (WMSC) and an online classification algorithm robust to weak labels. Specifically, we incrementally update the margin-based model using information from both the past model and the current incoming instance with partially observed labels. To increase the robustness to weak labels, we first adjust the classification margin of negative labels using the label causality matrix, which is constructed by the conditional probability of label pairs. Secondly, we introduce the label prototype matrix to regulate the margin by controlling the weighting parameter of the slack term. Additionally, to handle the potential distribution changes in labels, we utilize the instance-specific threshold via online thresholding to perform binary classification, which is formulated as a regression problem. Finally, theoretical analysis and empirical experimental results are presented to demonstrate the effectiveness of WMSC in classifying unobserved streaming instances. © 2024 IEEE.

关键词： Labeled data

来源：评论

学校读者我要写书评

暂无评论

ALOR: Adaptive layout optimization of raft groups for heterogeneous distributed key-value stores 1

引用

15th IFIP International Conference on Network and Parallel Computing, NPC 2018

作者： Wang, Yangyang Chai, Yunpeng Wang, Xin Key Laboratory of Data Engineering and Knowledge Engineering MOE Beijing China School of Information Renmin University of China Beijing China College of Intelligence and Computing Tianjin University Tianjin China

ISBN: (数字)9783030056773

ISBN: (纸本)9783030056766

Many distributed key-value storage systems employ the simple and effective Raft protocol to ensure data consistency. They usually assume a homogeneous node hardware configuration for the underlying cluster and thus adopt even data distribution schemes. However, today’s distributed systems tend to be heterogeneous in nodes’ I/O devices due to the regular worn I/O device replacement and the emergence of expensive new storage media (e.g., non-volatile memory). In this paper, we propose a new data layout scheme called Adaptive Layout Optimization of Raft groups (ALOR), considering the hardware heterogeneity of the cluster. ALOR aims to optimize the data layout of Raft groups to achieve a better practical load balance, which leads to higher performance. ALOR consists of two components: leader migration in Raft groups and skewed data layout based on cold data migration. We conducted experiments on a practical heterogeneous cluster, and the results indicate that, on average, ALOR improves throughput by 36.89%, reduces latency and 99th percentile tail latency by 24.54% and 21.32%, respectively. © IFIP International Federation for Information Processing 2018.

关键词： Digital storage

来源：评论

学校读者我要写书评

暂无评论

Mining Repetitive Negative Sequential Patterns with Gap Constraints

引用

ACM Transactions on knowledge Discovery from data 2025年第4期19卷

作者： Li, Yan Wang, Zhulin Liu, Jing Guo, Lei Fournier-Viger, Philippe Wu, Youxi Wu, Xindong School of Economics and Management Hebei University of Technology Tianjin China School of Artificial Intelligence Hebei University of Technology Tianjin China State Key Laboratory of Reliability and Intelligence of Electrical Equipment Hebei University of Technology Tianjin China College of Computer Science and Software Engineering Shenzhen University Shenzhen China Hebei Key Laboratory of Big Data Computing Tianjin China Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China) Hefei University of Technology Hefei China

Sequential pattern mining (SPM) with gap constraints (or repetitive SPM or tandem repeat discovery in bioinformatics) can find frequent repetitive subsequences satisfying gap constraints, which are called positive sequential patterns with gap constraints (PSPGs). However, classical SPM with gap constraints cannot find the frequent missing items in the PSPGs. To tackle this issue, this article explores negative sequential patterns with gap constraints (NSPGs). We propose an efficient NSPG-Miner algorithm that can mine both frequent PSPGs and NSPGs simultaneously. To effectively reduce candidate patterns, we propose a pattern join strategy with negative patterns which can generate both positive and negative candidate patterns at the same time. To calculate the support (frequency of occurrence) of a pattern in each sequence, we explore a NegPair algorithm that employs a key-value pair array structure to deal with the gap constraints and the negative items simultaneously and can avoid redundant rescanning of the original sequence, thus improving the efficiency of the algorithm. To report the performance of NSPG-Miner, 11 competitive algorithms and 11 datasets are employed. The experimental results not only validate the effectiveness of the strategies adopted by NSPG-Miner but also verify that NSPG-Miner can discover more valuable information than the state-of-the-art algorithms. Copyright © 2025 held by the owner/author(s). Publication rights licensed to ACM.

关键词： gap constraint key-value pair negative sequential pattern sequential pattern mining

来源：评论

学校读者我要写书评

暂无评论

Extracting multi-records from web pages

Extracting multi-records from web pages

引用

4th International Conference on Semantics, knowledge, and Grid, SKG 2008

作者： Tian, Xia Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China MOE Beijing 100872 China School of Information Resource Management Renmin University of China Beijing 100872 China

ISBN: (纸本)9780769534015

Extracting multi-records from web pages is useful, it allows us to integrate information from multiple sources to provide value-added services. Existing techniques still have some limitations because of their several restrictions and accuracy. This paper proposes a new method to perform multi-records extraction task automatically. Firstly, the HTML tag tree is build based on an embedded browser interface to solve the AJAX problem. Secondly, data regions are found out by data chunk comparison, and simple tree matching method is proposed to compute the chunk similarity. Finally, the main data region is determined and the multi-records are extracted out. Experimental results show that our method dramatically outperforms other existing methods, and it can extract multi-records from pages very accurately. © 2008 IEEE.

关键词： Websites

来源：评论

学校读者我要写书评

暂无评论

Efficient querying of correlated uncertain data with cached results

Efficient querying of correlated uncertain data with cached ...

引用

18th International Conference on database Systems for Advanced Applications, DASFAA 2013

作者： Chen, Jinchuan Zhang, Min Xie, Xike Du, Xiaoyong Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China MOE China School of Information Renmin University of China China Department of Computer Science Aalborg University Denmark

ISBN: (纸本)9783642374869

Although there have been many efforts for management of uncertain data, evaluating probabilistic inference queries, a known NP-hard problem, is still a big challenge, especially for querying data with highly correlations. The state-of-art exact algorithms for accelerating the evaluation of inference queries are based on special indices. Besides, with the observation of the existence of many frequent queries, some researchers try to improve efficiency by reusing previously queried results. Indexing depends on the static properties like data distributions, whereas caching is in favor of the dynamic features like query workload. In this paper we propose a new approach for speeding up the evaluation of inference queries by caching frequent results in a junction tree-based hierarchical index. To the best of our knowledge, this is the first effort on utilizing both the static (data) and dynamic (query workload) properties to efficiently evaluate probabilistic inference queries. Moreover, according to our experience, different caching strategies may significantly affect the query performance. Basically a good caching strategy needs to have high cache hit ratio with limited space *** on these considerations, we propose a novel caching approach, called FVEC, and present corresponding algorithms for efficiently querying correlated uncertain data. We further conduct a series of extensive experiments on large uncertain datasets in order to illustrate the effectiveness and efficiency of our proposed approaches. As illustrated by the results, compared with previous solutions, our method could greatly improve the query performance. © Springer-Verlag 2013.

关键词： Efficiency

来源：评论

学校读者我要写书评

暂无评论

Implementation and performance of VoIP interception based on SIP session border controller

Implementation and performance of VoIP interception based on...

引用

作者： Yang, Menghui Liu, Hua Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China MOE Beijing 100872 China School of Information Resources Management Renmin University of China Beijing 100872 China

In an effort to provide lawful interception for session initiation protocol (SIP) voice over Internet protocol (VoIP), an interception architecture using session border controller (SBC) is proposed. Moreover, a prototype based on the proposed architecture is implemented. A testbed is set up and tests are carried out in order to analyze the performance and the capability of function entities and interfaces in the proposed architecture. Test results show that SBC interception capability in SIP signaling is superior to that in real-time transport protocol (RTP) media stream. In order to eliminate the possible bottleneck of RTP packets interception in SBC, an analytic model is proposed to investigate the mechanism in which RTP packet's traffics are shared among different SBC media functions. Analysis results show that multiple SBC media functions can share the RTP packets arrival and can significantly decrease RTP packets service time in SBC. Test results also show that delivery function, collect function and their interfaces in the proposed interception architecture have corresponding interception performance and capability with SBC. © 2013 Springer Science+Business Media New York.

关键词： Controllers

来源：评论

学校读者我要写书评

暂无评论

Location selection for utility maximization with capacity constraints 12

Location selection for utility maximization with capacity co...

引用

21st ACM International Conference on Information and knowledge Management, CIKM 2012

作者： Sun, Yu Huang, Jin Chen, Yueguo Zhang, Rui Du, Xiaoyong School of Information Renmin University of China Beijing China University of Melbourne Melbourne VIC Australia Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China MOE China

ISBN: (纸本)9781450311564

Given a set of client locations, a set of facility locations where each facility has a service capacity, and the assumptions that: (i) a client seeks service from its nearest facility;(ii) a facility provides service to clients in the order of their proximity, we study the problem of selecting all possible locations such that setting up a new facility with a given capacity at these locations will maximize the number of served clients. This problem has wide applications in practice, such as setting up new distribution centers for online sales business and building additional base stations for mobile subscribers. We formulate the problem as location selection query for utility maximization. After applying three pruning rules to a baseline solution,we obtain an efficient algorithm to answer the query. Extensive experiments confirm the efficiency of our proposed algorithm. © 2012 ACM.

关键词： Location

来源：评论

学校读者我要写书评

暂无评论

FortisAVQA and MAVEN: a Benchmark dataset and Debiasing Framework for Robust Multimodal Reasoning

arXiv

引用

arXiv 2025年

作者： Ma, Jie Gao, Zhitao Chai, Qi Liu, Jun Wang, Pinghui Tao, Jing Su, Zhou Ministry of Education of Key Laboratory for Intelligent Networks and Network Security School of Cyber Science and Engineering Xi’an Jiaotong University Shaanxi Xi’an710049 China Shannxi Provincial Key Laboratory of Big Data Knowledge Engineering School of Computer Science and Technology Xi’an Jiaotong University Shaanxi Xi’an710049 China Guangdong Guangzhou510000 China

Audio-Visual Question Answering (AVQA) is a challenging multimodal reasoning task requiring intelligent systems to answer natural language queries based on paired audio-video inputs accurately. However, existing AVQA approaches often suffer from overfitting to dataset biases, leading to poor robustness. Moreover, current datasets may not effectively diagnose these methods. To address these challenges, we first introduce a novel dataset, FortisAVQA, constructed in two stages: (1) rephrasing questions in the test split of the public MUSIC-AVQA dataset and (2) introducing distribution shifts across questions. The first stage expands the test space with greater diversity, while the second enables a refined robustness evaluation across rare, frequent, and overall question distributions. Second, we introduce a robust Multimodal Audio-Visual Epistemic Network (MAVEN) that leverages a multifaceted cycle collaborative debiasing strategy to mitigate bias learning. Experimental results demonstrate that our architecture achieves state-of-the-art performance on FortisAVQA, with a notable improvement of 7.81%. Extensive ablation studies on both datasets validate the effectiveness of our debiasing components. Additionally, our evaluation reveals the limited robustness of existing multimodal QA methods. We also verify the plug-and-play capability of our strategy by integrating it with various baseline models across both datasets. Our dataset and code are available at https://***/reml-group/fortisavqa. Copyright © 2025, The Authors. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

RDF partitioning for scalable SPARQL query processing

引用

Frontiers of Computer Science 2015年第6期9卷 919-933页

作者： Xiaoyan WANG Tao YANG Jinchuan CHEN Long HE Xiaoyong DU School of Information Renmin University of China Beijing 100872 China Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education Renmin University Beijing 100872 China Information Center Supreme People's Court Beijing 100745 China State Key Laboratory of Software Development Environment Beihang University Beijing 100191 China

The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even to- tally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically parti- tioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these pro- posed approaches have been evaluated by extensive experiments over large RDF data sets.

关键词： RDF data data partitioning SPARQL query

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：