检索结果-内蒙古大学图书馆

Mining Repetitive Negative Sequential Patterns with Gap Constraints

ACM Transactions on knowledge Discovery from data 2025年第4期19卷

作者： Li, Yan Wang, Zhulin Liu, Jing Guo, Lei Fournier-Viger, Philippe Wu, Youxi Wu, Xindong School of Economics and Management Hebei University of Technology Tianjin China School of Artificial Intelligence Hebei University of Technology Tianjin China State Key Laboratory of Reliability and Intelligence of Electrical Equipment Hebei University of Technology Tianjin China College of Computer Science and Software Engineering Shenzhen University Shenzhen China Hebei Key Laboratory of Big Data Computing Tianjin China Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China) Hefei University of Technology Hefei China

Sequential pattern mining (SPM) with gap constraints (or repetitive SPM or tandem repeat discovery in bioinformatics) can find frequent repetitive subsequences satisfying gap constraints, which are called positive sequential patterns with gap constraints (PSPGs). However, classical SPM with gap constraints cannot find the frequent missing items in the PSPGs. To tackle this issue, this article explores negative sequential patterns with gap constraints (NSPGs). We propose an efficient NSPG-Miner algorithm that can mine both frequent PSPGs and NSPGs simultaneously. To effectively reduce candidate patterns, we propose a pattern join strategy with negative patterns which can generate both positive and negative candidate patterns at the same time. To calculate the support (frequency of occurrence) of a pattern in each sequence, we explore a NegPair algorithm that employs a key-value pair array structure to deal with the gap constraints and the negative items simultaneously and can avoid redundant rescanning of the original sequence, thus improving the efficiency of the algorithm. To report the performance of NSPG-Miner, 11 competitive algorithms and 11 datasets are employed. The experimental results not only validate the effectiveness of the strategies adopted by NSPG-Miner but also verify that NSPG-Miner can discover more valuable information than the state-of-the-art algorithms. Copyright © 2025 held by the owner/author(s). Publication rights licensed to ACM.

关键词： gap constraint key-value pair negative sequential pattern sequential pattern mining

来源：评论

学校读者我要写书评

暂无评论

Extracting multi-records from web pages

Extracting multi-records from web pages

引用

4th International Conference on Semantics, knowledge, and Grid, SKG 2008

作者： Tian, Xia Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China MOE Beijing 100872 China School of Information Resource Management Renmin University of China Beijing 100872 China

ISBN: (纸本)9780769534015

Extracting multi-records from web pages is useful, it allows us to integrate information from multiple sources to provide value-added services. Existing techniques still have some limitations because of their several restrictions and accuracy. This paper proposes a new method to perform multi-records extraction task automatically. Firstly, the HTML tag tree is build based on an embedded browser interface to solve the AJAX problem. Secondly, data regions are found out by data chunk comparison, and simple tree matching method is proposed to compute the chunk similarity. Finally, the main data region is determined and the multi-records are extracted out. Experimental results show that our method dramatically outperforms other existing methods, and it can extract multi-records from pages very accurately. © 2008 IEEE.

关键词： Websites

来源：评论

学校读者我要写书评

暂无评论

Implementation and performance of VoIP interception based on SIP session border controller

Implementation and performance of VoIP interception based on...

引用

作者： Yang, Menghui Liu, Hua Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China MOE Beijing 100872 China School of Information Resources Management Renmin University of China Beijing 100872 China

In an effort to provide lawful interception for session initiation protocol (SIP) voice over Internet protocol (VoIP), an interception architecture using session border controller (SBC) is proposed. Moreover, a prototype based on the proposed architecture is implemented. A testbed is set up and tests are carried out in order to analyze the performance and the capability of function entities and interfaces in the proposed architecture. Test results show that SBC interception capability in SIP signaling is superior to that in real-time transport protocol (RTP) media stream. In order to eliminate the possible bottleneck of RTP packets interception in SBC, an analytic model is proposed to investigate the mechanism in which RTP packet's traffics are shared among different SBC media functions. Analysis results show that multiple SBC media functions can share the RTP packets arrival and can significantly decrease RTP packets service time in SBC. Test results also show that delivery function, collect function and their interfaces in the proposed interception architecture have corresponding interception performance and capability with SBC. © 2013 Springer Science+Business Media New York.

关键词： Controllers

来源：评论

学校读者我要写书评

暂无评论

Location selection for utility maximization with capacity constraints 12

Location selection for utility maximization with capacity co...

引用

21st ACM International Conference on Information and knowledge Management, CIKM 2012

作者： Sun, Yu Huang, Jin Chen, Yueguo Zhang, Rui Du, Xiaoyong School of Information Renmin University of China Beijing China University of Melbourne Melbourne VIC Australia Key Laboratory of Data Engineering and Knowledge Engineering Renmin University of China MOE China

ISBN: (纸本)9781450311564

Given a set of client locations, a set of facility locations where each facility has a service capacity, and the assumptions that: (i) a client seeks service from its nearest facility;(ii) a facility provides service to clients in the order of their proximity, we study the problem of selecting all possible locations such that setting up a new facility with a given capacity at these locations will maximize the number of served clients. This problem has wide applications in practice, such as setting up new distribution centers for online sales business and building additional base stations for mobile subscribers. We formulate the problem as location selection query for utility maximization. After applying three pruning rules to a baseline solution,we obtain an efficient algorithm to answer the query. Extensive experiments confirm the efficiency of our proposed algorithm. © 2012 ACM.

关键词： Location

来源：评论

学校读者我要写书评

暂无评论

FortisAVQA and MAVEN: a Benchmark dataset and Debiasing Framework for Robust Multimodal Reasoning

arXiv

引用

arXiv 2025年

作者： Ma, Jie Gao, Zhitao Chai, Qi Liu, Jun Wang, Pinghui Tao, Jing Su, Zhou Ministry of Education of Key Laboratory for Intelligent Networks and Network Security School of Cyber Science and Engineering Xi’an Jiaotong University Shaanxi Xi’an710049 China Shannxi Provincial Key Laboratory of Big Data Knowledge Engineering School of Computer Science and Technology Xi’an Jiaotong University Shaanxi Xi’an710049 China Guangdong Guangzhou510000 China

Audio-Visual Question Answering (AVQA) is a challenging multimodal reasoning task requiring intelligent systems to answer natural language queries based on paired audio-video inputs accurately. However, existing AVQA approaches often suffer from overfitting to dataset biases, leading to poor robustness. Moreover, current datasets may not effectively diagnose these methods. To address these challenges, we first introduce a novel dataset, FortisAVQA, constructed in two stages: (1) rephrasing questions in the test split of the public MUSIC-AVQA dataset and (2) introducing distribution shifts across questions. The first stage expands the test space with greater diversity, while the second enables a refined robustness evaluation across rare, frequent, and overall question distributions. Second, we introduce a robust Multimodal Audio-Visual Epistemic Network (MAVEN) that leverages a multifaceted cycle collaborative debiasing strategy to mitigate bias learning. Experimental results demonstrate that our architecture achieves state-of-the-art performance on FortisAVQA, with a notable improvement of 7.81%. Extensive ablation studies on both datasets validate the effectiveness of our debiasing components. Additionally, our evaluation reveals the limited robustness of existing multimodal QA methods. We also verify the plug-and-play capability of our strategy by integrating it with various baseline models across both datasets. Our dataset and code are available at https://***/reml-group/fortisavqa. Copyright © 2025, The Authors. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

RDF partitioning for scalable SPARQL query processing

引用

Frontiers of Computer Science 2015年第6期9卷 919-933页

作者： Xiaoyan WANG Tao YANG Jinchuan CHEN Long HE Xiaoyong DU School of Information Renmin University of China Beijing 100872 China Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education Renmin University Beijing 100872 China Information Center Supreme People's Court Beijing 100745 China State Key Laboratory of Software Development Environment Beihang University Beijing 100191 China

The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even to- tally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically parti- tioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these pro- posed approaches have been evaluated by extensive experiments over large RDF data sets.

关键词： RDF data data partitioning SPARQL query

来源：评论

学校读者我要写书评

暂无评论

Hierarchical All-Pairs SimRank Calculation 28th

Hierarchical All-Pairs SimRank Calculation

引用

28th International Conference on database Systems for Advanced Applications, DASFAA 2023

作者： Zhang, Liangfu Li, Cuiping Zhang, Xue Chen, Hong Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education School of Information Renmin University of China Beijing China

ISBN: (纸本)9783031306747

All-pairs SimRank calculation is a classic SimRank problem. However, all-pairs algorithms suffer from efficiency issues and accuracy issues. In this paper, we convert the non-linear simrank calculation into a new simple closed formulation of linear system. And we come up with a sequence of novel algorithms to efficiently solve the linear system with accuracy guarantees. To reduce the memory consumption and improve the computational efficiency, we build a hierarchical framework to calculate the all-pairs SimRank scores, which includes locally coarse calculation and globally refine calculation. We first solve the local linear systems generated from the subgraphs, then we refine the SimRank scores on the full graph from the residuals of the local structures. We also show that our algorithms outperform the state-of-the-art all-pairs SimRank computation algorithms on real graphs. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： Computational efficiency

来源：评论

学校读者我要写书评

暂无评论

An integrated classification model for massive short texts with few words 19

An integrated classification model for massive short texts w...

引用

2019 International Conference on Robotics Systems and Vehicle Technology, RSVT 2019

作者： Tang, Xuetao Zhu, Yi Hu, Xuegang Li, Peipei Key Laboratory of Knowledge Engineering with Big Data Ministry of Education Hefei University of Technology China

ISBN: (纸本)9781450362429

The excellent performance of short texts classification has emerged in the past few years. However, massive short texts with few words like invoice data are different with traditional short texts like tweets in its no contextual and less semantic information, which hinders the application of conventional classification algorithms. To address these problems, we propose an integrated classification model for massive short texts with few words. More specifically, the word embedding model is introduced to train the word vectors of massive short texts with few words to form the feature space, and then the vector representation of each instance in texts is trained based on sentence embedding. With this integrated model, higher level representations are learned from massive short texts with few words. It can boost the performance of the base subsequent classifiers such as K-Nearest Neighbor. Extensive experiments conducted on dataset including 16 million real data demonstrate the superior classification performance of our proposed model compared with all competing state-of-the-art models. © 2019 Association for Computing Machinery.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

Improving performance by creating a native join-index for OLAP

引用

Frontiers of Computer Science 2011年第2期5卷 236-249页

作者： Yansong ZHANG Shan WANG Jiaheng LU National Survey Research Center at Renmin University of China Beijing 100872 China Key Laboratory of the Ministry of Education for Data Engineering and Knowledge Engineering Renmin University of China Beijing 100872 China School of Information Renmin University of China Beijing 100872 China

The performance of online analytical processing （OLAP） is critical for meeting the increasing requirements of massive volume analytical applications. Typical techniques, such as in-memory processing, column-storage, and join indexes focus on high perfor- mance storage media, efficient storage models, and reduced query processing. While they effectively perform OLAP applications, there is a vital limitation： main- memory database based OLAP （MMOLAP） cannot provide high performance for a large size data set. In this paper, we propose a novel memory dimension table model, in which the primary keys of the dimension table can be directly mapped to dimensional tuple addresses. To achieve higher performance of dimensional tuple access, we optimize our storage model for dimension tables based on OLAP query workload features. We present directly dimensional tuple accessing （DDTA） based join （DDTA- JOIN）, a technique to optimize query processing on the memory dimension table by direct dimensional tuple access. We also contribute by proposing an optimization of the predicate tree to shorten predicate operation length by pruning useless predicate processing. Our experimental results show that the DDTA-JOIN algorithm is superior to both simulated row-store main memory query processing and the open-source column-store main memory database MonetDB, thanks to the reduced join cost and simple yet efficient query processing.

关键词： directly dimensional tuple accessing (DDTA) – DDTA JOIN – native join index – predicate tree

来源：评论

学校读者我要写书评

暂无评论

Multi-view Feature Learning for the Over-penalty in Adversarial Domain Adaptation

引用

data Intelligence 2024年第1期6卷 183-200页

作者： Yuhong Zhang Jianqing Wu Qi Zhang Xuegang Hu School of Computer and Information Engineering Hefei University of TechnologyHefei 230601China Key Laboratory of Knowledge Engineering with Big Data(Hefei University of Technology) The Ministry of Education of ChinaHefei 230009China

Domain adaptation aims to transfer knowledge from the labeled source domain to an unlabeled target domain that follows a similar but different ***,adversarial-based methods have achieved remarkable success due to the excellent performance of domain-invariant feature presentation ***,the adversarial methods learn the transferability at the expense of the discriminability in feature representation,leading to low generalization to the target *** this end,we propose a Multi-view Feature Learning method for the Over-penalty in Adversarial Domain ***,multi-view representation learning is proposed to enrich the discriminative information contained in domain-invariant feature representation,which will counter the over-penalty for discriminability in adversarial ***,the class distribution in the intra-domain is proposed to replace that in the inter-domain to capture more discriminative information in the learning of transferrable *** experiments show that our method can improve the discriminability while maintaining transferability and exceeds the most advanced methods in the domain adaptation benchmark datasets.

关键词： domain adaptation adversarial learning multi-view learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：