Streaming graph processing needs to timely evaluate continuous queries. Prior systems suffer from massive redundant computations due to the irregular order of processing vertices influenced by updates. To address this...
Streaming graph processing needs to timely evaluate continuous queries. Prior systems suffer from massive redundant computations due to the irregular order of processing vertices influenced by updates. To address this issue, we propose ACGraph, a novel streaming graph processing approach for monotonic graph algorithms. It maintains dependence trees during runtime, and makes affected vertices processed in a top-to-bottom order in the hierarchy of the dependence trees, thus normalizing the state propagation order and coalescing of multiple propagation to the same vertices. Experimental results show that ACGraph reduces the number of updates by 50% on average, and achieves the speedup of 1.75~7.43× over state-of-the-art systems.
Large language models (LLMs) have excelled in various natural language processing tasks, but challenges in interpretability and trustworthiness persist, limiting their use in high-stakes fields. Causal discovery offer...
详细信息
Object detection tasks, crucial in safety-critical systems like autonomous driving, focus on pinpointing object locations. These detectors are known to be susceptible to backdoor attacks. However, existing backdoor te...
详细信息
Mobile data offloading has already appeared to offer the means of addressing the challenges of limited computing capability and battery life of mobile devices. Most existing code offloading frameworks only consider mi...
详细信息
Processing-in-memory (PIM) is promising to solve the well-known data movement challenge by performing in-situ computations near the data. Leveraging PIM features is pretty profitable to boost the energy efficiency of ...
详细信息
Processing-in-memory (PIM) is promising to solve the well-known data movement challenge by performing in-situ computations near the data. Leveraging PIM features is pretty profitable to boost the energy efficiency of applications. Early studies mainly focus on improving the programmability for computation offloading on PIM architectures. They lack a comprehensive analysis of computation locality and hence fail to accelerate a wide variety of applications. In this paper, we present a general-purpose instruction-level offloading technique for near-DRAM PIM architectures, namely IOTPIM, to exploit PIM features comprehensively. IOTPIM is novel with two technical advances: 1) a new instruction offloading policy that fully considers the locality of the whole on-chip cache hierarchy, and 2) an offloading performance benefit prediction model that directly predicts offloading performance benefits of an instruction based on the input dataset characterizes, preserving low analysis overheads. The evaluation demonstrates that IOTPIM can be applied to accelerate a wide variety of applications, including graph processing, machine learning, and image processing. IOT-PIM outperforms the state-of-the-art PIM offloading techniques by 1.28×-1.51× while ensuring offloading accuracy as high as 91.89% on average.
Embodied AI represents systems where AI is integrated into physical entities. Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI by facil...
详细信息
Gradient leakage attacks pose a significant threat to the privacy guarantees of federated learning. While distortion-based protection mechanisms are commonly employed to mitigate this issue, they often lead to notable...
详细信息
Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world *** and pipeline parallelism of FPGAs make it potential to process different stages of graph ***,...
详细信息
Hybrid pull-push computational model can provide compelling results over either of single one for processing real-world *** and pipeline parallelism of FPGAs make it potential to process different stages of graph ***,considering the limited on-chip resources and streamline pipeline computation,the efficiency of hybrid model on FPGAs often suffers due to well-known random access feature of graph *** this paper,we present a hybrid graph processing system on FPGAs,which can achieve the best of both *** approach on FPGAs is unique and novel as ***,we propose to use edge block(consisting of edges with the same destination vertex set),which allows to sequentially access edges at block granularity for locality while still preserving the *** to the independence of blocks in the sense that all edges in an inactive block are associated with inactive vertices,this also enables to skip invalid blocks for reducing redundant ***,we consider a large number of vertices and their associated edge-blocks to maintain a predictable execution *** also present to switch models in advance with few stalls using their state *** evaluation on a wide variety of graph algorithms for many real-world graphs shows that our approach achieves up to 3.69x speedup over state-of-the-art FPGA-based graph processing systems.
Temporal graph processing is used to handle the snapshots of the temporal graph, which concerns changes in graph over time. Although several software/hardware solutions have been designed for efficient temporal graph ...
Temporal graph processing is used to handle the snapshots of the temporal graph, which concerns changes in graph over time. Although several software/hardware solutions have been designed for efficient temporal graph processing, they still suffer from serious irregular data access due to the uncoordinated graph traversal. To overcome these limitations, this paper proposes SaGraph, a domain-specific hardware accelerator to support the efficient processing of temporal graph. Specifically, temporal graph processing shows strong data access similarity, i.e., most graph accesses of the processing of different snapshots are the same and usually refer to a small fraction of vertices. SaGraph can dynamically coordinate the graph traversals and adaptively cache the vertex states to fully exploit the data access similarity for smaller data access overhead. We implemented and evaluated SaGraph on a Xilinx Alveo U280 FPGA card. Compared with the cutting-edge software and hardware solutions, SaGraph achieves 8.5×-157.3×, 4.2×-16.1× speedups and 34.7×-423.6×, 5.3×-14.7× energy savings, respectively.
Although the containers are featured by light-weightness, it is still resource-consuming to pull and startup a large container image, especially in relatively resource-constrained edge cloud. Fortunately, Docker, as t...
Although the containers are featured by light-weightness, it is still resource-consuming to pull and startup a large container image, especially in relatively resource-constrained edge cloud. Fortunately, Docker, as the most widely used container, provides a unique layered architecture that allows the same layer to be shared between microservices so as to lower the deployment cost. Meanwhile, it is highly desirable to deploy dependent microservices of an application together to lower the operation cost. Therefore, the balancing of microservice deployment cost and the operation cost should be considered comprehensively to achieve minimal overall cost of an on-demand application. In this paper, we first formulate this problem into a Quadratic Integer Programming form (QIP) and prove it as a NP-hard problem. We further propose a Randomized Rounding-based Microservice Deployment and Layer Pulling (RR-MDLP) algorithm with low computation complexity and guaranteed approximation ratio. Through extensive experiments, we verify the high efficiency of our algorithm by the fact that it significantly outperforms existing state-of-the-art microservice deployment strategies.
暂无评论