检索结果-内蒙古大学图书馆

Proceedings of the 60th Annual ACM/IEEE Design Automation Conference

作者： Zihan Jiang Fubing Mao Yapu Guo Xu Liu Haikun Liu Xiaofei Liao Hai Jin Wei Zhang National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China Department of Electronic and Computer Engineering HKUST Hong Kong

ISBN: (纸本)9798350323481

Streaming graph processing needs to timely evaluate continuous queries. Prior systems suffer from massive redundant computations due to the irregular order of processing vertices influenced by updates. To address this issue, we propose ACGraph, a novel streaming graph processing approach for monotonic graph algorithms. It maintains dependence trees during runtime, and makes affected vertices processed in a top-to-bottom order in the hierarchy of the dependence trees, thus normalizing the state propagation order and coalescing of multiple propagation to the same vertices. Experimental results show that ACGraph reduces the number of updates by 50% on average, and achieves the speedup of 1.75~7.43× over state-of-the-art systems.

关键词： graph processing

来源：评论

学校读者我要写书评

暂无评论

DarkSAM: Fooling Segment Anything Model to Segment Nothing 38

DarkSAM: Fooling Segment Anything Model to Segment Nothing

引用

38th Conference on Neural Information Processing Systems, NeurIPS 2024

作者： Zhou, Ziqi Song, Yufei Li, Minghui Hu, Shengshan Wang, Xianlong Zhang, Leo Yu Yao, Dezhong Jin, Hai National Engineering Research Center for Big Data Technology and System China Services Computing Technology and System Lab China Cluster and Grid Computing Lab China Hubei Engineering Research Center on Big Data Security China Hubei Key Laboratory of Distributed System Security China School of Computer Science and Technology Huazhong University of Science and Technology China School of Cyber Science and Engineering Huazhong University of Science and Technology China School of Software Engineering Huazhong University of Science and Technology China School of Information and Communication Technology Griffith University Australia

Segment Anything Model (SAM) has recently gained much attention for its outstanding generalization to unseen data and tasks. Despite its promising prospect, the vulnerabilities of SAM, especially to universal adversarial perturbation (UAP) have not been thoroughly investigated yet. In this paper, we propose DarkSAM, the first prompt-free universal attack framework against SAM, including a semantic decoupling-based spatial attack and a texture distortion-based frequency attack. We first divide the output of SAM into foreground and background. Then, we design a shadow target strategy to obtain the semantic blueprint of the image as the attack target. DarkSAM is dedicated to fooling SAM by extracting and destroying crucial object features from images in both spatial and frequency domains. In the spatial domain, we disrupt the semantics of both the foreground and background in the image to confuse SAM. In the frequency domain, we further enhance the attack effectiveness by distorting the high-frequency components (i.e., texture information) of the image. Consequently, with a single UAP, DarkSAM renders SAM incapable of segmenting objects across diverse images with varying prompts. Experimental results on four datasets for SAM and its two variant models demonstrate the powerful attack capability and transferability of DarkSAM. Our codes are available at: https://***/CGCL-codes/DarkSAM. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Efficient distributed algorithms for holistic aggregation functions on random regular graphs

引用

Science China(Information Sciences) 2022年第5期65卷 32-50页

作者： Lin JIA Qiang-Sheng HUA Haoqiang FAN Qiuping WANG Hai JIN National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and TechnologyHuazhong University of Science and Technology Institute for Interdisciplinary Information Science Tsinghua University

In this paper, we propose efficient distributed algorithms for three holistic aggregation functions on random regular graphs that are good candidates for network topology in next-generation data *** three holistic aggregation functions include SELECTION（select the k-th largest or smallest element）,DISTINCT（query the count of distinct elements）, MODE（query the most frequent element）. We design three basic techniques — Pre-order Network Partition, Pairwise-independent Random Walk, and Random Permutation Delivery, and devise the algorithms based on the techniques. The round complexity of the distributed SELECTION is Θ（log N） which meets the lower bound where N is the number of nodes and each node holds a numeric element. The round complexity of the distributed DISTINCT and MODE algorithms are O（log3N/log log N） and O（log2N log log N） respectively. All of our results break the lower bounds obtained on general graphs and our distributed algorithms are all based on the CON GE S T model, which restricts each node to send only O（log N） bits on each edge in one round under synchronous communications.

关键词： distributed algorithms holistic aggregation function random regular graph ${\cal C}{\cal O}{\cal N}{\cal G}{\cal E}{\cal S}{\cal T}$ model communication complexity round complexity

来源：评论

学校读者我要写书评

暂无评论

Duo: Improving Data Sharing of Stateful Serverless Applications by Efficiently Caching Multi-Read Data

Duo: Improving Data Sharing of Stateful Serverless Applicati...

引用

International Symposium on Parallel and Distributed Processing (IPDPS)

作者： Zhuo Huang Hao Fan Chaoyi Cheng Song Wu Hai Jin National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China

A growing number of applications are moving to serverless architectures for high elasticity and fine-grained billing. For stateful applications, however, the use of serverless architectures is likely to lead to significant performance degradation, as frequent data sharing between different execution stages involves time-consuming remote storage access. Current platforms leverage memory cache to speed up remote access. However, conventional caching strategies show limited performance improvement. We experimentally find that the reason is that current strategies overlook the stage-dependent access patterns of stateful serverless applications, i.e., data that are read multiple times across stages (denoted as multi-read data) are wrongly evicted by data that are read only once (denoted as read-once data), causing a high cache miss ***, we propose a new caching strategy, Duo, whose design principle is to cache multi-read data as long as possible. Specifically, Duo contains a large cache list and a small cache list, which act as Leader list and Wingman list, respectively. Leader list ignores the data that is read for the first time to prevent itself from being polluted by massive read-once data at each stage. Wingman list inspects the data that are ignored or evicted by Leader list, and pre-fetches the data that will probably be read again based on the observation that multi-read data usually appear periodically in groups. Compared to the state-of-the-art works, Duo improves hit ratio by 1.1×-2.1× and reduces the data sharing overhead by 25%-62%.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Scalable Optimal Margin Distribution Machine

arXiv

引用

arXiv 2023年

作者： Wang, Yilin Cao, Nan Zhang, Teng Shi, Xuanhua Jin, Hai National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology China

Optimal margin Distribution Machine (ODM) is a newly proposed statistical learning framework rooting in the latest margin theory, which demonstrates better generalization performance than the traditional large margin based counterparts. However, it suffers from the ubiquitous scalability problem regarding both computation time and memory storage as other kernel methods. This paper proposes a scalable ODM, which can achieve nearly ten times speedup compared to the original ODM training method. For nonlinear kernels, we put forward a novel distribution-aware partition method to make the local ODM trained on each partition be close and converge fast to the global one. When linear kernel is applied, we extend a communication efficient SVRG method to accelerate the training further. Extensive empirical studies validate that our proposed method is highly computational efficient and almost never worsen the generalization. Copyright © 2023, The Authors. All rights reserved.

关键词： Machine learning

来源：评论

学校读者我要写书评

暂无评论

SumPA: Efficient Pattern-Centric Graph Mining with Pattern Abstraction 21

SumPA: Efficient Pattern-Centric Graph Mining with Pattern A...

引用

Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques

作者： Chuangyi Gui Xiaofei Liao Long Zheng Pengcheng Yao Qinggang Wang Hai Jin National Engineering Research Center for Big Data Technology and System/Service Computing Technology and System Lab/Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China

ISBN: (纸本)9781665442787

Graph mining aims to explore interesting structural information of a graph. Pattern-centric systems typically transform a generic-purpose graph mining problem into a series of subgraph matching problems for high performance. Existing pattern-centric mining systems reduce the substantial search space towards a single pattern by exploring a highly-optimized matching order, but inherent computational redundancies of such a matching order itself still suffer severely, leading to significant performance degradation. The key innovation of this work lies in a general redundancy criterion that characterizes computational redundancies arising in not only handing a single pattern but also matching multiple patterns simultaneously. In this paper, we present SumPA, a high-performance pattern-centric graph mining system that can sufficiently remove redundant computations for any complex graph mining problems. SumPA features three key designs: (1) a pattern abstraction technique that can simplify numerous complex patterns into a few simple abstract patterns based on pattern similarity, (2) abstraction-guided pattern matching that completely eliminates (totally and partially) redundant computations during subgraph enumeration, and (3) a suite of system optimizations to maximize storage and computation efficiency. Our evaluation on a wide variety of real-world graphs shows that SumPA outperforms the two state-of-the-art systems Peregrine and GraphPi by up to 61.89× and 8.94×, respectively. For many mining problems on large graphs, Peregrine takes hours or even days while SumPA finishes in only a few minutes.

关键词： data reuse

来源：评论

学校读者我要写书评

暂无评论

OUTLIER SYNTHESIS VIA HAMILTONIAN MONTE CARLO FOR OUT-OF-DISTRIBUTION DETECTION

arXiv

引用

arXiv 2025年

作者： Li, Hengzhuang Zhang, Teng National Engineering Research Center for Big Data Technology and System Service Computing Technology and Systems Laboratory Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan China

Out-of-distribution (OOD) detection is crucial for developing trustworthy and reliable machine learning systems. Recent advances in training with auxiliary OOD data demonstrate efficacy in enhancing detection capabilities. Nonetheless, these methods heavily rely on acquiring a large pool of high-quality natural outliers. Some prior methods try to alleviate this problem by synthesizing virtual outliers but suffer from either poor quality or high cost due to the monotonous sampling strategy and the heavy-parameterized generative models. In this paper, we overcome all these problems by proposing the Hamiltonian Monte Carlo Outlier Synthesis (HamOS) framework, which views the synthesis process as sampling from Markov chains. Based solely on the in-distribution data, the Markov chains can extensively traverse the feature space and generate diverse and representative outliers, hence exposing the model to miscellaneous potential OOD scenarios. The Hamiltonian Monte Carlo with sampling acceptance rate almost close to 1 also makes our framework enjoy great efficiency. By empirically competing with SOTA baselines on both standard and large-scale benchmarks, we verify the efficacy and efficiency of our proposed HamOS. Our code is available at: https://***/Fir-lat/HamOS_OOD. © 2025, CC BY.

关键词： Markov chains

来源：评论

学校读者我要写书评

暂无评论

MeG2: In-Memory Acceleration for Genome Graphs Analysis

MeG2: In-Memory Acceleration for Genome Graphs Analysis

引用

Design Automation Conference

作者： Yu Huang Long Zheng Haifeng Liu Zhuoran Zhou Dan Chen Pengcheng Yao Qinggang Wang Xiaofei Liao Hai Jin National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Laboratory Huazhong University of Science and Technology Wuhan China Zhejiang Lab Hangzhou China

Genome graphs analysis has emerged as an effective means to enable mapping DNA fragments (known as reads) to the reference genome. It replaces the traditional linear reference with a graph-based representation to augment the genetic variations and diversity information, significantly improving the quality of genotyping. The in-depth characterization of genome graphs analysis uncovers that it is bottlenecked by the irregular seed index access and the intensive alignment operation, stressing both the memory system and computing *** on these observations, we propose MeG 2 , a lightweight, commodity DRAM-compliant, processing-in-memory architecture to accelerate genome graphs analysis. MeG 2 is specifically integrated with the capabilities of both near-memory processing and bitwise in-situ computation. Specifically, MeG 2 leverages the low access latency of near-memory processing with the index-centric offload mechanism to alleviate the irregular memory access in the seeding procedure, and harnesses the row-parallel capacity of in-situ computation with the distance-aware technique to exploit the intensive computational parallelism in the alignment process. Results show that MeG 2 outperforms the CPU-, GPU-, and ASIC-based genome graphs analysis solutions by 502× (30.2×), 272× (15.1× ), and 5.5× (8.3×) for short (long) reads, while reducing energy consumption by 1628× (85.6×), 1443× (77.1×), and 7.8× (11.7×), respectively. We also demonstrate that MeG 2 offers significant improvements over existing PIM-based genome sequence analysis accelerators.

关键词：

来源：评论

学校读者我要写书评

暂无评论

It Takes Two to Tango: Serverless Workflow Serving via Bilaterally Engaged Resource Adaptation

arXiv

引用

arXiv 2025年

作者： Wu, Jing Wang, Lin Deng, Quanfeng Yu, Chen Zhang, Dong Yan, Bingheng Liu, Fangming National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University of Science and Technology China Paderborn University Germany Inspur Data Co. Ltd. China Peng Cheng Laboratory China

Serverless platforms typically adopt an early-binding approach for function sizing, requiring developers to specify an immutable size for each function within a workflow beforehand. Accounting for potential runtime variability, developers must size functions for worst-case scenarios to ensure service-level objectives (SLOs), resulting in significant resource inefficiency. To address this issue, we propose Janus, a novel resource adaptation framework for serverless platforms. Janus employs a late-binding approach, allowing function sizes to be dynamically adapted based on runtime conditions. The main challenge lies in the information barrier between the developer and the provider: developers lack access to runtime information, while providers lack domain knowledge about the workflow. To bridge this gap, Janus allows developers to provide hints containing rules and options for resource adaptation. Providers then follow these hints to dynamically adjust resource allocation at runtime based on real-time function execution information, ensuring compliance with SLOs. We implement Janus and conduct extensive experiments with real-world serverless workflows. Our results demonstrate that Janus enhances resource efficiency by up to 34.7% compared to the state-of-the-art. © 2025, CC BY-NC-ND.

关键词： Resource allocation

来源：评论

学校读者我要写书评

暂无评论

Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing

Enabling Efficient Large Recommendation Model Training with ...

引用

Annual International Symposium on Computer Architecture, ISCA

作者： Haifeng Liu Long Zheng Yu Huang Jingyi Zhou Chaoqiang Liu Runze Wang Xiaofei Liaot Hai Jinf Jingling Xue National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab Huazhong University of Science and Technology China School of Computer Science and Engineering University of New South Wales Australia Zhejiang Lab Hangzhou China

ISBN: (数字)9798350326581

ISBN: (纸本)9798350326598

Personalized recommendation systems have become one of the most important Internet services nowadays. A critical challenge of training and deploying the recommendation models is their high memory capacity and bandwidth demands, with the embedding layers occupying hundreds of GBs to TBs of storage. The advent of memory disaggregation technology and Compute Express Link (CXL) provides a promising solution for memory capacity scaling. However, relocating memory-intensive embedding layers to CXL memory incurs noticeable performance degradation due to its limited transmission bandwidth, which is significantly lower than the host memory bandwidth. To address this, we introduce ReCXL, a CXL memory disaggregation system that utilizes near-memory processing for scalable, efficient recommendation model training. ReCXL features a unified, hardwareefficient NMP architecture that processes the entire embedding training within CXL memory, minimizing data transfers over the bandwidth-limited CXL and enhancing internal bandwidth. To further improve the performance, ReCXL incorporates softwarehardware co-optimizations, including sophisticated dependencyfree prefetching and fine-grained update scheduling, to maximize hardware utilization. Evaluation results show that ReCXL outperforms the CPU-GPU baseline and the naïve CXL memory by $7.1 \times \sim 10.6 \times(9.4 \times$ on average) and $12.7 \times \sim 31.3 \times(22.6 \times$ on average), respectively.

关键词： Training Degradation Scalability Prefetching Memory management Web and internet services Bandwidth

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：