Node Importance Estimation (NIE) is a task that quantifies the importance of node in a graph. Recent research has investigated to exploit various information from Knowledge Graphs (KGs) to estimate node importance sco...
详细信息
Processing-in-memory (PIM) is promising to solve the well-known data movement challenge by performing in-situ computations near the data. Leveraging PIM features is pretty profitable to boost the energy efficiency of ...
详细信息
Processing-in-memory (PIM) is promising to solve the well-known data movement challenge by performing in-situ computations near the data. Leveraging PIM features is pretty profitable to boost the energy efficiency of applications. Early studies mainly focus on improving the programmability for computation offloading on PIM architectures. They lack a comprehensive analysis of computation locality and hence fail to accelerate a wide variety of applications. In this paper, we present a general-purpose instruction-level offloading technique for near-DRAM PIM architectures, namely IOTPIM, to exploit PIM features comprehensively. IOTPIM is novel with two technical advances: 1) a new instruction offloading policy that fully considers the locality of the whole on-chip cache hierarchy, and 2) an offloading performance benefit prediction model that directly predicts offloading performance benefits of an instruction based on the input dataset characterizes, preserving low analysis overheads. The evaluation demonstrates that IOTPIM can be applied to accelerate a wide variety of applications, including graph processing, machine learning, and image processing. IOT-PIM outperforms the state-of-the-art PIM offloading techniques by 1.28×-1.51× while ensuring offloading accuracy as high as 91.89% on average.
Stream Learning (SL) requires models that can quickly adapt to continuously evolving data, posing significant challenges in both computational efficiency and learning accuracy. Effective data selection is critical in ...
详细信息
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia, highlighting the urgent need for robust and generalizable face forgery detection (FFD) techniqu...
详细信息
With the evolution of self-supervised learning, the pre-training paradigm has emerged as a predominant solution within the deep learning landscape. Model providers furnish pre-trained encoders designed to function as ...
详细信息
With the evolution of self-supervised learning, the pre-training paradigm has emerged as a predominant solution within the deep learning landscape. Model providers furnish pre-trained encoders designed to function as versatile feature extractors, enabling downstream users to harness the benefits of expansive models with minimal effort through fine-tuning. Nevertheless, recent works have exposed a vulnerability in pre-trained encoders, highlighting their susceptibility to downstream-agnostic adversarial examples (DAEs) meticulously crafted by attackers. The lingering question pertains to the feasibility of fortifying the robustness of downstream models against DAEs, particularly in scenarios where the pre-trained encoders are publicly accessible to the attackers. In this paper, we initially delve into existing defensive mechanisms against adversarial examples within the pre-training paradigm. Our findings reveal that the failure of current defenses stems from the domain shift between pre-training data and downstream tasks, as well as the sensitivity of encoder parameters. In response to these challenges, we propose Genetic Evolution-Nurtured Adversarial Fine-tuning (Gen-AF), a two-stage adversarial fine-tuning approach aimed at enhancing the robustness of downstream models. Gen-AF employs a genetic-directed dual-track adversarial fine-tuning strategy in its first stage to effectively inherit the pre-trained encoder. This involves optimizing the pre-trained encoder and classifier separately while incorporating genetic regularization to preserve the model’s topology. In the second stage, Gen-AF assesses the robust sensitivity of each layer and creates a dictionary, based on which the top-k robust redundant layers are selected with the remaining layers held fixed. Upon this foundation, we conduct evolutionary adaptability fine-tuning to further enhance the model’s generalizability. Our extensive experiments, conducted across ten self-supervised training methods and six
Code summarization facilitates program comprehension and software maintenance by converting code snippets into natural-language descriptions. Over the years, numerous methods have been developed for this task, but a k...
详细信息
The rapid development of large language models (LLMs) has transformed many industries, including healthcare. However, previous medical LLMs have largely focused on leveraging general medical knowledge to provide respo...
详细信息
Gradient leakage attacks pose a significant threat to the privacy guarantees of federated learning. While distortion-based protection mechanisms are commonly employed to mitigate this issue, they often lead to notable...
详细信息
Temporal graph processing is used to handle the snapshots of the temporal graph, which concerns changes in graph over time. Although several software/hardware solutions have been designed for efficient temporal graph ...
Temporal graph processing is used to handle the snapshots of the temporal graph, which concerns changes in graph over time. Although several software/hardware solutions have been designed for efficient temporal graph processing, they still suffer from serious irregular data access due to the uncoordinated graph traversal. To overcome these limitations, this paper proposes SaGraph, a domain-specific hardware accelerator to support the efficient processing of temporal graph. Specifically, temporal graph processing shows strong data access similarity, i.e., most graph accesses of the processing of different snapshots are the same and usually refer to a small fraction of vertices. SaGraph can dynamically coordinate the graph traversals and adaptively cache the vertex states to fully exploit the data access similarity for smaller data access overhead. We implemented and evaluated SaGraph on a Xilinx Alveo U280 FPGA card. Compared with the cutting-edge software and hardware solutions, SaGraph achieves 8.5×-157.3×, 4.2×-16.1× speedups and 34.7×-423.6×, 5.3×-14.7× energy savings, respectively.
暂无评论