Latency-Critical (LC) cloud applications pose three important challenges: 1) meeting tail latency Service-Level Objective (SLO), 2) attaining predictable tail latency, and 3) achieving high energy efficiency. In this ...
详细信息
The emergence of large real life networks such as social networks, web page links, and traffic networks exhibits complex graph structures with millions of vertices and edges. Among many operations for exploiting these...
详细信息
ISBN:
(纸本)9783662439845;9783662439838
The emergence of large real life networks such as social networks, web page links, and traffic networks exhibits complex graph structures with millions of vertices and edges. Among many operations for exploiting these graphs, the shortest path discovery is a major and expensive one. Besides the in-memory approaches, many efficient shortest path computation methods have been developed on top of distributed and parallel platforms. Pregel, a bulk synchronous parallel framework, is one of them for processing large graphs. The known shortest path computation approach with Pregel is computation intensive and unable to target real-time services. In this paper, we propose a Pregel based efficient k-distance index technique that allows efficient single pair shortest path discovery. We reduce the network cost and unnecessary operations by transmitting more information in a single superstep. The extensive experiments on both real and synthetic datasets reveal the superiority of the proposed approach.
The monitoring of parallel and distributedapplications is a common approach for gathering information concerning program execution, for behavioral analysis of the application or of the supporting platform. The collec...
详细信息
Clusters built from single-core systems are cost-effective as for the performance improvement and availability. However, the hardware constraints put limitations on the performance of single-core systems. Hence, it is...
详细信息
ISBN:
(纸本)9780769533520
Clusters built from single-core systems are cost-effective as for the performance improvement and availability. However, the hardware constraints put limitations on the performance of single-core systems. Hence, it is difficult to meet with the increasing high performance requirements of diversified applications at different levels for general-purpose computing. A promising feasible solution is the novice multi-core systems which extend the parallelism to CPU level by integrating multiple processing units on a single die. This paper uses Finite-Difference Time-Domain (FDTD) algorithm as a case study, designing suitable parallel FDTD algorithms for three architectures: distributed-memory machines with single-core processors, shared-memory machines with dual-core processors, and the Cell Broadband Engine (Cell/B.E.) processor with nine heterogeneous cores. The experiment results show that the Cell/B.E. processor using 8 SPEs achieves a significant speedups of 7.05 faster than AMD single-core Opteron processor and 3.37 than AMD dual-core Opeteron processor at the Processor level.
Rapid development of infrared detector arrays caused a need to develop robust signal processing chain able to perform operations on infrared image in real-time. Every infrared detector array suffers from so-called non...
详细信息
ISBN:
(纸本)9780819481245
Rapid development of infrared detector arrays caused a need to develop robust signal processing chain able to perform operations on infrared image in real-time. Every infrared detector array suffers from so-called nonuniformity, which has to be digitally compensated by the internal circuits of the camera. Digital circuit also has to detect and replace signal from damaged detectors. At the end the image has to be prepared for display on external display unit. For the best comfort of viewing the delay between registering the infrared image and displaying it should be as short as possible. That is why the image processing has to be done with minimum latency. This demand enforces to use special processingtechniques like pipelining and parallelprocessing. Designed infrared processing module is able to perform standard operations on infrared image with very low latency. Additionally modular design and defined data bus allows easy expansion of the signal processing chain. Presented image processing module was used in two camera designs based on uncooled microbolometric detector array form ULIS and cooled photon detector from Sofradir. The image processing module was implemented in FPGA structure and worked with external ARM processor for control and coprocessing. The paper describes the design of the processing unit, results of image processing, and parameters of module like power consumption and hardware utilization.
This book constitutes the thoroughly refereed post-conference proceedings of the 18th international Workshop on Job Scheduling Strategies for parallelprocessing, JSSPP 2014, held in Phoenix, AZ, USA, in May 2014. The...
详细信息
ISBN:
(数字)9783319157894
ISBN:
(纸本)9783319157887
This book constitutes the thoroughly refereed post-conference proceedings of the 18th international Workshop on Job Scheduling Strategies for parallelprocessing, JSSPP 2014, held in Phoenix, AZ, USA, in May 2014. The 9 revised full papers presented were carefully reviewed and selected from 24 submissions. The papers cover the following topics: single-core parallelism; moving to distributed-memory, larger-scale systems, scheduling fairness; and parallel job scheduling.
Satisfiability Modulo Theories on arithmetic theories have significant applications in many important domains. Previous efforts have been mainly devoted to improving the techniques and heuristics in sequential SMT sol...
详细信息
ISBN:
(纸本)9783031656262;9783031656279
Satisfiability Modulo Theories on arithmetic theories have significant applications in many important domains. Previous efforts have been mainly devoted to improving the techniques and heuristics in sequential SMT solvers. With the development of computing resources, a promising direction to boost performance is parallel and even distributed SMT solving. We explore this potential in a divide-and-conquer view and propose a novel dynamic parallel framework with variable-level partitioning. To the best of our knowledge, this is the first attempt to perform variable-level partitioning for arithmetic theories. Moreover, we enhance the interval constraint propagation algorithm, coordinate it with Boolean propagation, and integrate it into our variable-level partitioning strategy. Our partitioning algorithm effectively capitalizes on propagation information, enabling efficient formula simplification and search space pruning. We apply our method to three state-of-the-art SMT solvers, namely CVC5, OpenSMT2, and Z3, resulting in efficient parallel SMT solvers. Experiments are carried out on benchmarks of linear and nonlinear arithmetic over both real and integer variables, and our variable-level partitioning method shows substantial improvements over previous partitioning strategies and is particularly good at non-linear theories.
The proceedings contain 88 papers. The topics discussed include: clock synchronization in cell BE traces;supporting parameter sweep applications with synthesized grid services;a P2P approach to resource discovery in o...
详细信息
ISBN:
(纸本)3540854509
The proceedings contain 88 papers. The topics discussed include: clock synchronization in cell BE traces;supporting parameter sweep applications with synthesized grid services;a P2P approach to resource discovery in on-line monitoring of grid workflows;transparent mobile middleware integration for java and .NET development environments;providing non-stop service for message-passing based parallelapplications with RADIC;on-line performance modeling for MPI applications;MPC: a unified parallel runtime for clusters of NUMA machines;directory-based metadata optimizations for small files in PVFS;Caspian: a tunable performance model for multi-core systems;performance model for parallel mathematical libraries based on historical knowledgebase;a performance model of dense matrix operations on many-core architectures;empirical analysis of a large-scale hierarchical storage system;and to snoop or not to snoop: evaluation of fine-grain and coarse-grain snoop filtering techniques.
The quality of the mesh is one of the most critical aspects for solving partial differential equations (PDEs) in applications of Computational Fluid Dynamics. Many geometry criteria have been proposed and are widely u...
详细信息
ISBN:
(数字)9781728142487
ISBN:
(纸本)9781728142487
The quality of the mesh is one of the most critical aspects for solving partial differential equations (PDEs) in applications of Computational Fluid Dynamics. Many geometry criteria have been proposed and are widely used in business pre-processing software like ICEM CFD, PointWise, Gambit. However, these traditional geometry criteria fail to recognize some quality features that seriously affect the accuracy of numerical calculations, such as density and distribution of mesh elements. These quality features are usually evaluated based on engineering experience, which heavily increases the pre-processing cost and requires extensive engineering experience. In this paper, we introduce a deep learning model to solve the mentioned issues by offline learning. The proposed model is small and fast and can be embedded in pre-processing software. Experiment results show that the derived model is capable of performing the quality evaluating task and achieve an accuracy of 93.8%.
The proceedings contain 53 papers from the High Performance Computing for Computational Science - VECPAR 2004 - 6th internationalconference. The topics discussed include: large scale simulations;development and integ...
详细信息
The proceedings contain 53 papers from the High Performance Computing for Computational Science - VECPAR 2004 - 6th internationalconference. The topics discussed include: large scale simulations;development and integration of parallel multidisciplinary computational software for modeling a modern manufacturing process;a survey of high-quality computational libraries and their impact in science and engineering applications;a performance prediction model for tomographic reconstruction in structural biology;a high performance system for processing queries on distributed geospatial data sets;parallel implementation of information retrieval clustering models;and scaling up the preventive replication of autonomous databases in cluster systems.
暂无评论