The paper presents the experiences of the design and development of an industrial measurement system. The architecture of the system is parallel and highly scalable. As studies show parallel systems are more error pro...
详细信息
ISBN:
(数字)9781510622043
ISBN:
(纸本)9781510622043
The paper presents the experiences of the design and development of an industrial measurement system. The architecture of the system is parallel and highly scalable. As studies show parallel systems are more error prone than sequential ones. Errors may be in synchronization or data sharing and can sometimes hinder processing within time limits acceptable for a measurement system. So, the performance problems may also be dependability ones. In this paper, the problems met during the implementation of a measurement system, as well as theirs solutions, are presented. One of them was unpredictable behavior of garbage collector which decreased system performance. Some deadlock situations have also been identified, which may occur if the measurement device (i.e. hardware) would experience a specific failure mode. It is shown, how substantially performance increase and effective and scalable code was achieved.
This paper proposes a technology for large biomedical data analyzing based on CUDA computation. The technology was used to analyze a large set of fundus images used for diabetic retinopathy automatic diagnostics. A hi...
详细信息
ISBN:
(纸本)9781728152585
This paper proposes a technology for large biomedical data analyzing based on CUDA computation. The technology was used to analyze a large set of fundus images used for diabetic retinopathy automatic diagnostics. A high-performance algorithm has been developed to calculate effective textural characteristics for medical image analysis. During the automatic image diagnostics, the following classes were distinguished: thin vessels, thick vessels, exudates and healthy areas. The mentioned algorithm's efficiency study was conducted with 500x500-1000x1000 pixels images using a 12x12 dimension window. The relationship between the developed algorithm's acceleration and data sizes was demonstrated. The study showed that the algorithm effectiveness can be depends of certain characteristics of the image, as its clarity, the shape of exudate zone, the variability of blood vessels, and the optic disc's location.
Remote Sensing (RS) data processing is characterized by massive remote sensing images and increasing amount of algorithms of higher complexity. parallel programming for data-intensive applications like massive remote ...
详细信息
ISBN:
(纸本)9781467324229
Remote Sensing (RS) data processing is characterized by massive remote sensing images and increasing amount of algorithms of higher complexity. parallel programming for data-intensive applications like massive remote sensing image processing on parallel systems is bound to be especially trivial and challenging. We propose a C++ template mechanism enabled generic parallel programming skeleton for these remote sensing applications in high performance clusters. It provides both programming templates for distributed RS data and generic parallel skeletons for RS algorithms. Through one-side communication primitives provided by MPI, the distributed RS data template could provide a global view of the big RS data whose sliced data blocks are scattered among the distributed memory of cluster nodes. Moreover, by data serialization and RMA (Remote Memory Access), the data templates could also offer a simple and effective way to distribute and communicate massive remote sensing data with complex data structures. Furthermore, the generic parallel skeletons implement the recurring patterns of computation, performance optimization and pass the user-defined sequential functions as parameters of templates for type genericity. With the implemented skeletons, Developers without extensive parallel computing technologies can implement efficient parallel remote sensing programs without concerning for parallel computing details. Through experiments on remote sensing applications, we confirmed that our templates were productive and efficient.
High-performance application development remains challenging, particularly tor scientists making the transition to a heterogeneous grid environment. In general areas of computing, virtual environments such as Java and...
详细信息
High-performance application development remains challenging, particularly tor scientists making the transition to a heterogeneous grid environment. In general areas of computing, virtual environments such as Java and Net have proved to he successful in fostering application development, allowing users to target and compile to a single environment, rather than a range of platforms, instruction sets and libraries. However, existing runtime environments are focused on business and desktop computing and they do not support the necessary high-performance computing (HPC) abstractions required by e-Scientists. Our work is focused on developing an application-runtime that can support these services natively. The result is a new approach to the development of an application-runtime for HPC: the Motor system has been developed by, integrating a high-performance communication library directly within a virtual machine. The Motor message passing library is integrated alongside and in cooperation with other runtime libraries and services while retaining a strong message passing performance. As a result, the application developer is provided with a common environment for HPC application development. This environment supports both procedural languages, such as C, and modern object-oriented languages, such as C#. This paper describes the unique Motor architecture, presents its implementation and demonstrates its performance and use. Copyright (C) 2008 John Wiley & Sons, Ltd.
Nowadays NVIDIA s CUDA is a general purpose scalable parallel programming model for writing highly parallel applications It provides several key abstractions - a hierarchy of thread blocks shared memory and barrier sy...
详细信息
Nowadays NVIDIA s CUDA is a general purpose scalable parallel programming model for writing highly parallel applications It provides several key abstractions - a hierarchy of thread blocks shared memory and barrier synchronization This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes In this paper we propose a parallel programming approach using hybrid CUDA OpenMP and MPI programming which partition loop iterations according to the number of C1060 CPU nodes in a CPU cluster which consists of one C1060 and one S1070 Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node (C) 2010 Elsevier B V All rights reserved
We employ probabilistic causality analysis to study the performance data of 301 students from the upper-level undergraduate parallel programming class at the University of Central Florida. To our surprise, we discover...
详细信息
ISBN:
(纸本)9781538655559
We employ probabilistic causality analysis to study the performance data of 301 students from the upper-level undergraduate parallel programming class at the University of Central Florida. To our surprise, we discover that good performance in our lower-level undergraduate programming CS-1 and CS-II classes is not a significant causal factor that contributed to good performance in our parallel programming class. On the other hand, good performance in systems classes like Operating Systems, Information Security, Computer Architecture, Object Oriented Software and Systems Software coupled with good performance in theoretical classes like Introduction to Discrete Structures, Artificial Intelligence and Discrete Structures-II are strong indicators of good performance in our upper-level undergraduate parallel programming class. We believe that such causal analysis may be useful in identifying whether parallel and distributed computing concepts have effectively penetrated the lower-level computer science classes at an institution.
A hash function maps an arbitrary length of (longer) message into a fixed length of shorter string, called message digest. Inevitably there will be a lot of different messages being hashed to the same or similar diges...
详细信息
ISBN:
(纸本)9781509015405
A hash function maps an arbitrary length of (longer) message into a fixed length of shorter string, called message digest. Inevitably there will be a lot of different messages being hashed to the same or similar digest. We call this collision or partial collision. By utilizing multiple processors from the CUNY High Performance Computing Center's facility, we locate partial collisions for MD5 and SHA-1 by brute force parallel programming in C with MPI library. The brute force method of finding a second preimage collision entails systematically computing all of the permutations, digests, and Hamming distances of the target preimage. We explore varying size target strings and the number of processors allocation and examine the effect these variables have on finding partial collisions. The results show that for the same message space the search time for the partial collisions is roughly halved for each doubling of the number of processors;and the longer the message is the better partial collisions are produced.
We study how the concept of generic programming using C++ templates, realized in the Standard Template Library (STL), can be efficiently exploited in the specific domain of parallel programming. We present our approac...
详细信息
ISBN:
(纸本)3540221190
We study how the concept of generic programming using C++ templates, realized in the Standard Template Library (STL), can be efficiently exploited in the specific domain of parallel programming. We present our approach, implemented in the DatTeL data-parallel library, which allows simple programming for various parallel architectures while staying within the paradigm of classical C++ template programming. The novelty of the DatTeL is the use of higher-order parallel constructs, skeletons, in the STL-context and the easy extensibility of the library with new, domain-specific skeletons. We describe the principles of our approach based on skeletons, and explain our design decisions and their implementation in the library. The presentation is illustrated with a case study - the parallelization of a generic algorithm for carry-lookahead addition. We compare the DatTeL to related work and report both absolute performance and speedups achieved for the case study on parallel machines with shared and distributed memory.
With the current prevalence of multi-core processors in SMP cluster architectures, mixed-mode programming, using both MPI and OpenMP in the same application, is becoming increasingly important. In this paper we discus...
详细信息
ISBN:
(纸本)9781315684895;9781138028142
With the current prevalence of multi-core processors in SMP cluster architectures, mixed-mode programming, using both MPI and OpenMP in the same application, is becoming increasingly important. In this paper we discuss three methods for the parallelization of such algorithms, namely pure MPI parallelization, fine-grain hybrid MPI/OpenMP parallelization, and coarse-grain MPI/OpenMP parallelization. We propose a new hybrid parallel programming method based on architecture hierarchy on SMP cluster. We designed a hierarchical parallel algorithm on the N-body problem, and compare its performance with the traditional hybrid parallel algorithm on the Dawning 5000A cluster. The results indicate that the hierarchical hybrid parallel algorithm has better scalability and speed.
We exploited the recent advances in Internet connectivity and Web technologies for building Web-based parallel programming environments (WPPEs) that facilitate the development and execution of parallel programs on rem...
详细信息
ISBN:
(纸本)0818681187
We exploited the recent advances in Internet connectivity and Web technologies for building Web-based parallel programming environments (WPPEs) that facilitate the development and execution of parallel programs on remote high-performance computers. A Web browser running on the user's machine provides a user-friendly interface to sewer-site user accounts and allows the use of parallel computing platforms and software in a convenient manner. The user may create, edit, and execute files through this Web browser interface. This new Web-based client-sewer architecture has the potential of being used as a future front-end to high-performance computer systems. We discuss the design and implementation of several prototype WPPEs that are currently in use at the Northeast parallel Architectures Center and the Cornell Theory Center These initial prototypes support high-level parallel programming with Fortran 90 and Nigh Performance Fortran (HPF), as well as explicit tow-level programming with Message Passing Interface (MPI). We detail the lessons learned during the development process and outline the tradeoffs of various design choices in the realization of the design. We especially concentrate on providing sewer-site user accounts, mechanisms to access those accounts through the Web, and the Web-related system security issues.
暂无评论