The proceedings contain 82 papers. The topics discussed include: comparative exon prediction based on heuristic coding region alignment;topological properties of necklace networks;external double hashing with choice;o...
详细信息
ISBN:
(纸本)0769525091
The proceedings contain 82 papers. The topics discussed include: comparative exon prediction based on heuristic coding region alignment;topological properties of necklace networks;external double hashing with choice;on online partially fractional knapsack problem;process scheduling for the parallel desktop;cloning-based checkpoint for localized recovery;overlay networks with class;fault-tolerant routing algorithm for RDT structure;region abstraction for event tracking in wireless sensor networks;the impact of dynamic link slowdowns on network stability;secure continuity for sensor networks;the structure of super line graphs;computational complexity and bounds for the neighbor-scattering number of graphs;a hybrid algorithm for dynamic lightpath protection in survivable WDM optical networks;mutually independent Hamiltonian cycles in hypercubes;on a traffic control problem;and intervehicle communication protocol for emergency situations.
The proceedings contain 43 papers. The topics discussed include: SIP-based cross-domain proxy handoff for mobile streaming services;task parallelism for object oriented programs;a taxonomy of data prefetching mechanis...
详细信息
ISBN:
(纸本)9780769531250
The proceedings contain 43 papers. The topics discussed include: SIP-based cross-domain proxy handoff for mobile streaming services;task parallelism for object oriented programs;a taxonomy of data prefetching mechanisms;quantitative evaluation of common subexpression elimination on queue machines;enhancing route recovery for QAODV routing in mobile Ad hoc networks;handoff performance comparison of mobile IP, fast handoff and mSCTP in mobile wireless networks;sensor deployment and source localization;on the longest fault-tree paths in hypercubes with more faulty nodes;node-to-set disjoint path routing in dual-cube;resource placement in cube-connected cycles;study of cluster-based data forwarding in sensor networks with limited high-power mobile nodes;barrier coverage with mobile sensors;an energy-efficient geographic routing with location errors in wireless sensor networks;and product line sigraphs.
AIAC algorithms (Asynchronous Iterations Asynchronous Communications) are a particular class of parallel iterative algorithms. Their asynchronous nature makes them more efficient than their synchronous counterparts in...
详细信息
AIAC algorithms (Asynchronous Iterations Asynchronous Communications) are a particular class of parallel iterative algorithms. Their asynchronous nature makes them more efficient than their synchronous counterparts in numerous cases as has already been shown in previous works. The first goal of this article is to compare several parallelprogramming environments in order to see if there is one of them which is best suited to efficiently implement AIAC algorithms. The main criterion for this comparison consists in the performances achieved in a global context of grid computing for two classical scientific problems. Nevertheless, we also take into account two secondary criteria which are the ease of programming and the ease of deployment. The second goal of this study is to extract from this comparison the important features that a parallelprogramming environment must have in order to be suited for the implementation of AIAC algorithms.
In this paper, a parallel method for solving generalized eigenvalue problem based on multi-core platform is presented, which can provide parts of the eigenpairs in parallel. Compared with traditional numerical method,...
详细信息
ISBN:
(纸本)9780769548982;9781467345668
In this paper, a parallel method for solving generalized eigenvalue problem based on multi-core platform is presented, which can provide parts of the eigenpairs in parallel. Compared with traditional numerical method, the parallel method in this paper using numerical integration, numerical experiments are implemented with a quad-core computer under the programming environment of Matlab parallel toolbox. The problems of computing the frequencies of a plane wing and aircraft pylon are taken as examples, which show the efficiency and applicability of our scheme.
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread placement and data placement in order to achieve p...
详细信息
ISBN:
(纸本)9780769548982;9781467345668
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread placement and data placement in order to achieve performance gain up to 50% compared to state-of-the-art libraries such as Plasma or MKL.
In this study, we will parallelize the D&C algorithm with CUDA. In stead of recursive programming in D&C, the recursive stack is implemented on the host side (CPU) and the merge operation is executes on GPU in...
详细信息
ISBN:
(纸本)9780769548982;9781467345668
In this study, we will parallelize the D&C algorithm with CUDA. In stead of recursive programming in D&C, the recursive stack is implemented on the host side (CPU) and the merge operation is executes on GPU in parallel. Since the recursive stack is a fully binary tree in this algorithm, the merge operations on the nodes in each layer of the binary tree can be performed synchronously. In this data-parallel computation, with the careful management of data structure, the data of each node can be arranged in the same block and no need to share data between threads, so the parallelism is not broken.
We present new parallel algorithms for testing pattern involvement for all length 4 permutations. Our algorithms have the complexity of O(log n) time with n/log n processors on the CREW PRAM model, O(log log log n) ti...
详细信息
ISBN:
(纸本)9781479938445
We present new parallel algorithms for testing pattern involvement for all length 4 permutations. Our algorithms have the complexity of O(log n) time with n/log n processors on the CREW PRAM model, O(log log log n) time with n/log log log n processors or constant time and n log 3 n processors on a CRCW PRAM model. parallel algorithms were not designed before for some of these patterns and for other patters the previous best algorithms require O(log n) time and n processors on the CREW PRAM model.
Dynamic Graph Neural Networks (DGNNs) have recently been used in numerous application domains, comprehending the intricate dynamics of time-evolving graph data. Despite their theoretical advancements, effectively impl...
详细信息
ISBN:
(数字)9798331506476
ISBN:
(纸本)9798331506483
Dynamic Graph Neural Networks (DGNNs) have recently been used in numerous application domains, comprehending the intricate dynamics of time-evolving graph data. Despite their theoretical advancements, effectively implementing scalable DGNNs continues to be a formidable challenge due to the constantly evolving graph data and heterogeneous computation kernels. Recent efforts attempted to either exploit the graph data reuse to reduce memory access or eliminate the redundant computations between consecutive graph snapshots to scale the DGNN acceleration. These efforts are still falling short. In prior work, each graph snapshot, regardless of its size and connectivity, passes through the entire DGNN computation pipeline from layer to layer. Consequently, substantial intermediate data is generated throughout the DGNN computation, which leads to excessive offchip memory access. To address this crucial challenge, we argue that the computations between evolving graph snapshots should be decoupled from the DGNN execution pipeline. In this paper, we propose I-DGNN, a theoretical, architectural, and algorithmic framework with the aim of designing scalable and efficient accelerators for DGNN execution with improved performance and energy efficiency. On the theory side, the key idea is to identify essential computations between consecutive graph snapshots and encapsulate them as a separate kernel independent from the DGNN model. Specifically, the proposed one-pass DGNN computing model extracts the process of graph update as a chained matrix multiplication between evolving graphs through rigorous mathematical derivations. Consequently, consecutive snapshots utilize a onepass computation kernel instead of passing through the entire DGNN execution pipeline, thereby eliminating the costly data movement of intermediate results across DGNN layers. On the architecture side, we propose a unified accelerator architecture that can be dynamically configured to support the computation charac
暂无评论