As one of the most popular accelerators, Graphics processing Unit (GPU) has demonstrated high computing power in several application fields. On the other hand, GPU also produces high power consumption and has been one...
详细信息
This paper presents the first experimental results of the use of our new adaptive tool for synchronization, based on ordered read-write locks, ORWL. They provide a new synchronizing method for data-oriented parallel a...
详细信息
This paper presents the first experimental results of the use of our new adaptive tool for synchronization, based on ordered read-write locks, ORWL. They provide a new synchronizing method for data-oriented parallel algorithms and are particularly suited for iterative pipelined algorithms with out-of-core data. We conducted experiments with the classic benchmarking Livermore Kernel 23 algorithm to validate the theoretical model and measure the efficiency of the first available implementation of ORWL in the PARXXL library. They show that this tool is able to efficiently control an IO bound application running on 64 parallel POSIX threads with tight data dependencies between them.
A high-level understanding of how an application executes and which performance characteristics it exhibits is essential in many areas of high performance computing, such as application optimization, hardware developm...
详细信息
ISBN:
(纸本)9783642141218
A high-level understanding of how an application executes and which performance characteristics it exhibits is essential in many areas of high performance computing, such as application optimization, hardware development, and system procurement. Tools are needed to help users in uncovering the application characteristics, but current approaches are unsuitable to help develop a structured understanding of program execution akin to flow charts. Profiling tools are efficient in terms of overheads but their way of recording performance data discards temporal information. Tracing preserves all the temporal information but distilling the essential high level structures, such as initialization and iteration phases can be challenging and cumbersome. We present a technique that extends an existing profiling tool to capture event flow graphs of MPI applications. Event flow graphs try to strike a balance between the abundance of data contained in full traces and the concise information profiling tools can deliver with low overheads. We describe our technique for efficiently gathering an event flow graph for each process of an MPI application and for combining these graphs into a single application-level flow graph. We explore ways to reduce the complexity of the graphs by collapsing nodes in a step-by-step fashion and present techniques to explore flow graphs interactively.
Location based services personalize their behaviors based on location data. When data kept by a service have evolved or the code has been modified, regression testing can be employed to assure the quality of services....
详细信息
The commercial success of Cloud computing and recent developments in Grid computing have brought platform virtualization technology into the field of high performance computing. Virtualization offers both more flexibi...
详细信息
The commercial success of Cloud computing and recent developments in Grid computing have brought platform virtualization technology into the field of high performance computing. Virtualization offers both more flexibility and security through custom user images and user isolation. In this paper, we deal with the problem of distributing virtual machine (VM) images to a set of distributed compute nodes in a Cross-Cloud computing environment, i.e., the connection of two or more Cloud computing sites. Ambrust et al. identified data transfer bottlenecks as one of the obstacles Cloud computing has to solve to be a commercial success. Several methods for distributing VM images are presented, and optimizations based on copy on write layers are discussed. The performance of the presented solutions and the security overhead is evaluated.
Recent developments in MRI contrast agents give new perspectives in radiological diagnosis and therapy planning, but require specific image analysis methods. By employment of an academic research grid, we are currentl...
详细信息
ISBN:
(纸本)9781607505839
Recent developments in MRI contrast agents give new perspectives in radiological diagnosis and therapy planning, but require specific image analysis methods. By employment of an academic research grid, we are currently validating and optimizing a recently developed fully automatic method for liver segmentation in Gd-EOB enhanced MRI. The grid enables extensive parameter scans and evaluation against expert's reference segmentation. The implementation layout and so far reached results are presented. Furthermore, experiences made in the production phase and consequences resulting for the exploitation of publicly funded research grids for Healthgrid applications are given.
We present a real-time distributed system for tracking with non-overlapping camera views. Each camera performs multi-object tracking, and cameras communicate with each other in a peer-to-peer manner for consistent lab...
详细信息
We present a real-time distributed system for tracking with non-overlapping camera views. Each camera performs multi-object tracking, and cameras communicate with each other in a peer-to-peer manner for consistent labeling. To match objects across non-overlapping views, we employ multiple features, namely color histogram, height, travel time and speed. First, camera configuration and reference values of different features are learned in the training phase. Then, we combine multiple evidences by computing an overall similarity score, which is a weighted sum of the similarity scores of different features. Communication and frame processing run in parallel and share memory. Experimental results show the success of the presented system in real-time tracking with non-overlapping cameras and in handling merge cases.
Increasing speeds and volumes push network packet applications to use parallelprocessing to boost performance. Examining the packet payload (message content) is a key aspect of packet processing. Applications search ...
详细信息
Increasing speeds and volumes push network packet applications to use parallelprocessing to boost performance. Examining the packet payload (message content) is a key aspect of packet processing. Applications search payloads to find strings that match a pattern described by regular expressions (regex). Searching for multiple strings that may start anywhere in the payload is a major obstacle to performance. Commercial systems often employ multiple network processors to provide parallelprocessing in general and use regex software engines or special regex processors to speed up searching performance via parallelism. Typically, regex rules are prepared separately from the application program and compiled into a binary image to be read by a regex processor or software engine. Our approach integrates specifying search rules with specifying network application code written in packet C, a C dialect that hides host-machine specifics, supports coarse-grain parallelism and supplies high-level data type and operator extensions for packet processing. packetC provides a search set data type, as well as match and find operations, to support payload searching. We show that our search set operator implementation, using associative memory and regex processors, lets users enjoy the performance benefits of parallel regex technology without learning hardware-specifics or using a separate regex toolchain's use.
FAFNER is code developed by Lister which simulates by Monte Carlo methods the Neutral Beam Injection (NBI) technology, one of the most extended heating methods for fusion devices. To the date, FAFNER has been usually ...
详细信息
FAFNER is code developed by Lister which simulates by Monte Carlo methods the Neutral Beam Injection (NBI) technology, one of the most extended heating methods for fusion devices. To the date, FAFNER has been usually run at CIEMAT adapted to the TJ-II helical axis stellarator on shared memory Cray architecture machines. From this version, FAFNER has been ported to the Grid in the framework of the EGEE Project. At the same time, this work is the first step of a more ambitious target since the code can be now coupled to many others such as ion transport tools. In this paper, all these preliminary advances are described as well as the performance and portability gains obtained.
We investigate the scalability of the hypergraph-based sparse matrix partitioning methods with respect to the increasing sizes of matrices and number of nonzeros. We propose a method to rowwise partition the matrices ...
详细信息
We investigate the scalability of the hypergraph-based sparse matrix partitioning methods with respect to the increasing sizes of matrices and number of nonzeros. We propose a method to rowwise partition the matrices that correspond to the discretization of two-dimensional domains with the five-point stencil. The proposed method obtains perfect load balance and achieves very good total communication volume. We investigate the behaviour of the hypergraph-based rowwise partitioning method with respect to the proposed method, in an attempt to understand how scalable the former method is. In another set of experiments, we work on general sparse matrices under different scenarios to understand the scalability of various hypergraph-based one- and two-dimensional matrix partitioning methods.
暂无评论