An abundance of data have been generated from various embedded devices, applications, and systems, and require cost-efficient storage services. Data deduplication removes duplicate chunks and becomes an important tech...
详细信息
An abundance of data have been generated from various embedded devices, applications, and systems, and require cost-efficient storage services. Data deduplication removes duplicate chunks and becomes an important technique for storage systems to improve space efficiency. However, stored unique chunks are heavily fragmented, decreasing restore performance and incurs high overheads for garbage collection. Existing schemes fail to achieve an efficient trade-off among deduplication, restore and garbage collection performance, due to failing to explore and exploit the physical locality of different chunks. In this paper, we trace the storage patterns of the fragmented chunks in backup systems, and propose a high-performance deduplication system, called HiDeStore. The main insight is to enhance the physical-locality for the new backup versions during the deduplication phase, which identifies and stores hot chunks in the active containers. The chunks not appearing in new backups become cold and are gathered together in the archival containers. Moreover, we remove the expired data with an isolated container deletion scheme, avoiding the high overheads for expired data detection. Compared with state-of-the-art schemes, HiDeStore improves the deduplication and restore performance by up to 1.4x and 1.6x, respectively, without decreasing the deduplication ratios and incurring high garbage collection overheads.
Real-time systems involve tasks that may voluntarily suspend their execution as they await specific events or resources. Such self-suspension can introduce further delays and unpredictability in scheduling, making the...
详细信息
Utilizing interpolation techniques (IT) within reversible data hiding (RDH) algorithms presents the advantage of a substantial embedding capacity. Nevertheless, prevalent algorithms often straightforwardly embed confi...
详细信息
Genomic sequencing has become increasingly prevalent, generating massive amounts of data and facing a significant challenge in long-term storage and transmission. A solution that reduces the storage and transfer requi...
详细信息
In today’s era, smartphones are used in daily lives because they are ubiquitous and can be customized by installing third-party apps. As a result, the menaces because of these apps, which are potentially risky for u...
详细信息
In this fast processing world, we need fast processing programs with maximum accuracy. This can be achieved when computer vision is connected with optimized deep learning models and neural networks. The goal of this p...
详细信息
Lightweight cryptography algorithms have concentrated on key generation's randomness, unpredictable nature, and complexity to improve the resistance of ciphers. Therefore, the key is an essential component of ever...
详细信息
Author name disambiguation(AND)is a central task in academic search,which has received more attention recently accompanied by the increase of authors and academic *** tackle the AND problem,existing studies have propo...
详细信息
Author name disambiguation(AND)is a central task in academic search,which has received more attention recently accompanied by the increase of authors and academic *** tackle the AND problem,existing studies have proposed various approaches based on different types of information,such as raw document features(e.g.,co-authors,titles,and keywords),the fusion feature(e.g.,a hybrid publication embedding based on multiple raw document features),the local structural information(e.g.,a publication's neighborhood information on a graph),and the global structural information(e.g.,interactive information between a node and others on a graph).However,there has been no work taking all the above-mentioned information into account and taking full advantage of the contributions of each raw document feature for the AND problem so *** fill the gap,we propose a novel framework named EAND(Towards Effective Author Name Disambiguation by Hybrid Attention).Specifically,we design a novel feature extraction model,which consists of three hybrid attention mechanism layers,to extract key information from the global structural information and the local structural information that are generated from six similarity graphs constructed based on different similarity coefficients,raw document features,and the fusion *** hybrid attention mechanism layer contains three key modules:a local structural perception,a global structural perception,and a feature ***,the mean absolute error function in the joint loss function is used to introduce the structural information loss of the vector *** results on two real-world datasets demonstrate that EAND achieves superior performance,outperforming state-of-the-art methods by at least+2.74%in terms of the micro-F1 score and+3.31%in terms of the macro-F1 score.
The Internet of Things (IoT) is a constantly expanding system connecting countless devices for seamless data collection and exchange. This has transformed decision-making with data-driven insights across different dom...
详细信息
The discourse analysis task,which focuses on understanding the semantics of long text spans,has received increasing attention in recent *** a critical component of discourse analysis,discourse relation recognition aim...
详细信息
The discourse analysis task,which focuses on understanding the semantics of long text spans,has received increasing attention in recent *** a critical component of discourse analysis,discourse relation recognition aims to identify the rhetorical relations between adjacent discourse units(e.g.,clauses,sentences,and sentence groups),called arguments,in a *** works focused on capturing the semantic interactions between arguments to recognize their discourse relations,ignoring important textual information in the surrounding ***,in many cases,more than capturing semantic interactions from the texts of the two arguments are needed to identify their rhetorical relations,requiring mining more contextual *** this paper,we propose a method to convert the RST-style discourse trees in the training set into dependency-based trees and train a contextual evidence selector on these transformed *** this way,the selector can learn the ability to automatically pick critical textual information from the context(i.e.,as evidence)for arguments to assist in discriminating their *** we encode the arguments concatenated with corresponding evidence to obtain the enhanced argument ***,we combine original and enhanced argument representations to recognize their *** addition,we introduce auxiliary tasks to guide the training of the evidence selector to strengthen its selection *** experimental results on the Chinese CDTB dataset show that our method outperforms several state-of-the-art baselines in both micro and macro F1 scores.
暂无评论