How to coordinate the design of sampling and Sparse-dense Matrix Multiplication (SpMM) is important in Graph Neural Network (GNN) acceleration. However, existing methods have an imbalance between accuracy and speed in...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
How to coordinate the design of sampling and Sparse-dense Matrix Multiplication (SpMM) is important in Graph Neural Network (GNN) acceleration. However, existing methods have an imbalance between accuracy and speed in performing GNN inference tasks due to irrational sampling strategies. To solve this problem, we propose an adaptive edge sampling strategy SpMM kernel. It considers the relationship between the number of non-zero elements in each matrix row and the shared memory width. The edge sampling scheme is adaptively selected according to the different situations of each row. Our method reduces the graph size by adaptive edge sampling to fit into the GPU’s shared memory, which decreases the computational cost and increases the data locality ultimately achieving a balance between accuracy and speed in GNN inference. We conducted experiments on NVIDIA RTX 4060 Ti GPU using representative GNN model and datasets. Experimental results show that our designed kernel outperforms the cuSPARSE SpMM kernel and GE-SpMM by up to 20.2× and 17.3× respectively, with less than 1% accuracy loss. Compared to ES-SpMM, it reduces the accuracy loss by 4.7% on average and achieves an average 1.26× speedup.
We have developed an affordable distributed Internet of Things (IoT) testbed, named XiveNet, to conduct in-vehicle security research. This testbed merges the adaptability of simulators with the real-time ECU character...
详细信息
ISBN:
(数字)9798350369199
ISBN:
(纸本)9798350369205
We have developed an affordable distributed Internet of Things (IoT) testbed, named XiveNet, to conduct in-vehicle security research. This testbed merges the adaptability of simulators with the real-time ECU characteristics of actual vehicles. The testbed is made up of ECU chips found in vehicles, Raspberry Pis, and is combined with a bus master simulator. Our experiments with CAN (controller area network) traffic from actual vehicles (Oak Ridge National Laboratories Road Data Set) demonstrate that our testbed closely replicates the attributes of a real vehicle. We have further authenticated our testbed by deploying SecCAN, a secure CAN algorithm, and evaluating its security by injecting invalid frames. Furthermore, we examined ORNL’s timing-based intrusion detection on our testbed and successfully produced alerts. Additionally, we incorporated Named Data Networking (NDN) capable nodes, providing researchers with an additional resource to develop future in-vehicle security solutions. Finally, we have proposed a bitrate hopping technique focused on preventing the denial of service attack and conducted a preliminary investigation using the testbed. Our evaluation and validation indicate that the testbed provides the real-world vehicle environment with the flexibility of a simulation environment that supports a wide range of hardware and software configurations.
In light of previous endeavors and trends in the realm of parallel programming, HPPython emerges as an essential superset that enhances the accessibility of parallel programming for developers, facilitating scalabilit...
详细信息
ISBN:
(纸本)9798400708954
In light of previous endeavors and trends in the realm of parallel programming, HPPython emerges as an essential superset that enhances the accessibility of parallel programming for developers, facilitating scalability across multiple nodes. Despite Python's popularity as a programming language in scientific and engineering applications and its native support for executing various processes, HPPython brings substantial simplification to the development of parallel programs and empowers program distribution across heterogeneous clusters consisting of multiple physical computers. HPPython leverages the MPI standard for its underlying communication, thereby harnessing the benefits of the SPMD model. Additionally, HPPython introduces novel syntax and constructs, such as parallel loops and distributed lists, while endeavoring to retain the natural essence of the original language. This paper delves into the distinct components of HPPython and elucidates their integration, establishing HPPython as a viable solution for parallel programming in today's data-driven world.
The proceedings contain 17 papers. The topics discussed include: automatic traffic light preemption for intelligent transportation systems;towards an elastic lock-free Hash Trie design;a novel server-side aggregation ...
ISBN:
(纸本)9781665432818
The proceedings contain 17 papers. The topics discussed include: automatic traffic light preemption for intelligent transportation systems;towards an elastic lock-free Hash Trie design;a novel server-side aggregation strategy for federated learning in non-IID situations;an asynchronous distributed-memory optimization solver for two-stage stochastic programming problems;translation based self-reconfiguration algorithm for 6-lattice modular robots;curator - a system for creating data sets for behavioral malware detection;parallel and distributed task-based Kirchhoff seismic pre-stack depth migration application;periodicity detection algorithm and applications on IoT data;parallel cloud movement forecasting based on a modified boids flocking algorithm;and efficient real-time earliest deadline first based scheduling for Apache spark.
This paper reviews the massively micro-parallel compute system POETS (Partially Ordered Event Triggered System) and illustrates its potential for speeding up demanding applications. Application domains that benefit fr...
详细信息
ISBN:
(纸本)9781450396608
This paper reviews the massively micro-parallel compute system POETS (Partially Ordered Event Triggered System) and illustrates its potential for speeding up demanding applications. Application domains that benefit from POETS include simulations of physical systems that can be discretised as a mesh. The problem graph is distributed over a large compute mesh;each mesh vertex contains a processor - an FPGA-based RISC-V thread supporting custom instructions in our prototype - and a small amount of local problem state data. There is no central overseer of any sort and processors cannot see memory besides their own. A problem graph vertex interacts with a neighbour to send a state change by sending an asynchronous packet. The packets are fixed size and small - currently 64 bytes - and the hardware communications infrastructure is very fast. Applications can use an asynchronous 'packet storm' approach;run synchronously using a hardware idle barrier or run in a globally asynchronous, locally synchronous manner. Results show significant wallclock speedup and power consumption improvement over conventional systems: for one application we show a 40-fold speedup over a conventional CPU-based system;versus a multi-GPU system, the POETS cluster is 26% faster, 60% more power efficient, and 34% more energy efficient.
A key aspect of Intelligent Manufacturing is the interface between the Edge and Fog layers on one side and the Cloud on the other side. Once that some data extraction and cleaning has been performed in the (logical an...
详细信息
Emerging HPC platforms are becoming more difficult to program as a result of systems with different node architectures, some with a small number of "fat" heterogenous nodes (consisting of multiple accelerato...
详细信息
ISBN:
(纸本)9781665435772
Emerging HPC platforms are becoming more difficult to program as a result of systems with different node architectures, some with a small number of "fat" heterogenous nodes (consisting of multiple accelerators) and others with a large number of "thin" homogenous nodes consisting of multi-core CPUs connected with high speed interconnects. New programming models are emerging to address performance portability of the applications as well as a set of scientific libraries that applications can use to exploit these architectures efficiently. To port applications to new architectures, developers need information about their source code characteristics including static and dynamic (e.g. performance) information to refactor the code, understand their data and code structure, and library usage as well as program information to direct their optimisation efforts and make key decisions. In this paper, we describe a tool that combines compiler and profiler information to query program characteristics in a given programming environment. Static and dynamic data about applications is collected and stored together in an SQL database that can be later queried to study application characteristics and patterns. We will demonstrate the capabilities of this tool with an application-driven case study that aims at understanding application code and its use of scientific libraries via a real world example from the molecular simulation application CP2K.
The proceedings contain 11 papers. The topics discussed include: Config 2.0: towards reinforcement learning based configuration of stream processing systems;AutoML: towards automation of machine learning systems maint...
ISBN:
(纸本)9781450391559
The proceedings contain 11 papers. The topics discussed include: Config 2.0: towards reinforcement learning based configuration of stream processing systems;AutoML: towards automation of machine learning systems maintainability;multi-objective evolutionary based feature selection supported by distributed multi-label classification and deep learning on image/video data;efficient parallel execution of block transactions in blockchain;OneOS: a distributed operating system for the internet of things;non-relational multi-level caching for mitigation of staleness & stragglers in distributed deep learning;using blockchain to provide trusted interoperability to system-of-systems in smart cities context;autonomous resource management in distributed stream processing systems;and using blockchain technology for software identity maintenance.
The problem of detecting a change in the distribution of a statistically periodic process is investigated. The problem is posed in the framework of independent and periodically identically distributed (i.p.i.d.) proce...
详细信息
ISBN:
(纸本)9781538682098
The problem of detecting a change in the distribution of a statistically periodic process is investigated. The problem is posed in the framework of independent and periodically identically distributed (i.p.i.d.) processes, a recently introduced class of processes to model statistically periodic data. An algorithm is proposed that is shown to be robust against an uncertainty in the post-change law. The motivation for the problem comes from event detection problems in traffic data, social network data, electrocardiogram data, and neural data, where periodic statistical behavior has been observed.
Private information retrieval (PIR) protocols allow a user to retrieve entries of a database without revealing the index of the desired item. Information-theoretical privacy can be achieved by the use of several serve...
详细信息
ISBN:
(纸本)9781538682098
Private information retrieval (PIR) protocols allow a user to retrieve entries of a database without revealing the index of the desired item. Information-theoretical privacy can be achieved by the use of several servers and specific retrieval algorithms. In this paper, we investigate the problem of PIR under erasure-coded distributed storage systems and construct PIR protocols with optimal computational complexity for the servers, reasonable communication complexity, and low storage overhead. The proposed constructions also enjoy the advantages of low encoding complexity and low memory requirement for storing all possible queries for the user. More specifically, we concentrate on the study of using circulant permutation matrices or the zero matrix to construct PIR protocols for non-communicating servers.
暂无评论