Floating point arithmetic, as specified in the IEEE standard, is used extensively in programs for science and engineering. This use is expanding rapidly into other domains, for example with the growing application of ...
详细信息
ISBN:
(纸本)9781538643686
Floating point arithmetic, as specified in the IEEE standard, is used extensively in programs for science and engineering. This use is expanding rapidly into other domains, for example with the growing application of machine learning everywhere. While floating point arithmetic often appears to be arithmetic using real numbers, or at least numbers in scientific notation, it actually has a wide range of gotchas. Compiler and hardware implementations of floating point inject additional surprises. This complexity is only increasing as different levels of precision are becoming more common and there are even proposals to automatically reduce program precision (reducing power/energy and increasing performance) when results are deemed "good enough." Are software developers who depend on floating point aware of these issues? Do they understand how floating point can bite them? To find out, we conducted an anonymous study of different groups from academia, national labs, and industry. The participants in our sample did only slightly better than chance in correctly identifying key unusual behaviors of the floating point standard, and poorly understood which compiler and architectural optimizations were nonstandard. These surprising results and others strongly suggest caution in the face of the expanding complexity and use of floating point arithmetic.
Today parallel computing is essential for the success of many real-world applications and softwaresystems. Nonetheless, most computer science undergraduate courses teach students how to think and program sequentially...
详细信息
ISBN:
(纸本)9781538655559
Today parallel computing is essential for the success of many real-world applications and softwaresystems. Nonetheless, most computer science undergraduate courses teach students how to think and program sequentially. Further, software professionals have complained about the computer science curriculum's lag behind industry in their failing to cover modern programming technologies such as parallel programming. The emphasis on parallel programming has become even more important due to the increasing adoption of horizontal scaling approaches to cope with massive datasets. In order to help students coming from a serial curriculum comprehend parallel concepts, we used an innovative approach that utilized active learning, visualizations, examples, discussions, and practical exercises. Further, we conducted an experiment to examine the effect of active learning on students' understanding of parallel programming. Results indicate that the students that were actively engaged with the material performed better in terms of understanding parallel programming concepts than other students.
DAISy (distributed Array of Inexpensive systems) is a 16 node PC cluster running a full UNIX compatible operating system. The network media used includes standard 10Mb/s (10BASE-2) Ethernet (used for client node NFS m...
详细信息
ISBN:
(纸本)0818675829
DAISy (distributed Array of Inexpensive systems) is a 16 node PC cluster running a full UNIX compatible operating system. The network media used includes standard 10Mb/s (10BASE-2) Ethernet (used for client node NFS mounts and any client node interactive work users find necessary), and, switched 100Mbs/ (100BASE-TX) Fast Ethernet (used for user program message passing traffic). The DAISy cluster is used to investigate the viability of commodity PC technology to perform computation of scientific and engineering problems traditionally performed on 'Supercomputers,' and more recently high performance RISC workstations and clusters of RISC workstations. Performance analysis of the various single node subsystems were carried out, along with performance analysis of the cluster as a whole on a number of parallel applications. The results show that the current Pentium 90MHz CPU and motherboards used are well within that of many low-end workstations offered by traditional workstation vendors.
A class of specialised data structures designed for the distributed solution of non-conventional finite element formulations, which are equally effective when used in conjunction with conventional formulations, is pre...
详细信息
A class of specialised data structures designed for the distributed solution of non-conventional finite element formulations, which are equally effective when used in conjunction with conventional formulations, is presented. We begin by briefly discussing how the non-conventional finite element formulations being developed within the structural analysis group at IST [Freitas JAT, Almeida JPM, Pereira EMBR. Non-conventional formulations for the finite element method. Comput Mech 1999;23(5-6):488-501] lead to systems of equations that appear to be naturally suited for parallel processing, but we also recognise that to take full advantage of the characteristics of these systems - large dimension, non-overlapping block structure and sparsity - it is necessary to use appropriate data structures. The approach presented, which references the logical subdivisions of the system matrices, was designed to fulfil these objectives. Examples of parallel performance and efficiency on an homogeneous distributed platform are presented. (c) 2006 Published by Elsevier Ltd.
This paper describes compiler techniques that can translate standard OpenMP applications into code for distributed computer systems. OpenMP has emerged as an important model and language extension for shared-memory pa...
详细信息
This paper describes compiler techniques that can translate standard OpenMP applications into code for distributed computer systems. OpenMP has emerged as an important model and language extension for shared-memory parallel programming. However, despite OpenMP's success on these platforms, it is not currently being used on distributed system. The long-term goal of our project is to quantify the degree to which such a use is possible and develop supporting compiler techniques. Our present compiler techniques translate OpenMP programs into a form suitable for execution on a software DSM system. We have implemented a compiler that performs this basic translation, and we have studied a number of hand optimizations that improve the baseline performance. Our approach complements related efforts that have proposed language extensions for efficient execution of OpenMP programs on distributedsystems. Our results show that, while kernel benchmarks can show high efficiency of OpenMP programs on distributedsystems, full applications need careful consideration of shared data access patterns. A naive translation ( similar to OpenMP compilers for SMPs) leads to acceptable performance in very few applications only. However, additional optimizations, including access privatization, selective touch, and dynamic scheduling, resulting in 31% average improvement on our benchmarks.
distributed storage systems have become popular for handling the enormous amounts of data in network-centric systems. A distributed storage system provides client processes with the abstraction of a shared variable th...
详细信息
ISBN:
(纸本)9781424437511
distributed storage systems have become popular for handling the enormous amounts of data in network-centric systems. A distributed storage system provides client processes with the abstraction of a shared variable that satisfies some consistency and reliability properties. Typically the properties are ensured through a replication-based implementation. This paper presents an algorithm for a replicated read-write register that cat? tolerate Byzantine failures of some of the replica servers. The targeted consistency condition is a version of regularity that supports multiple writers. Although regularity is weaker than the more frequently supported condition of atomicity it is still strong enough to be useful in some important applications. By weakening the consistency condition, the algorithm can support multiple writers more efficiently than the known multi-writer algorithms for atomic consistency.
We address the main issues when porting existing codes from serial to parallel computers and when developing portable parallelsoftware on MIMD multiprocessors (shared memory, virtual shared memory, and distributed me...
详细信息
We address the main issues when porting existing codes from serial to parallel computers and when developing portable parallelsoftware on MIMD multiprocessors (shared memory, virtual shared memory, and distributed memory multiprocessors, and networks of computers). We discuss the use of numerical libraries as a way of developing portable and efficient parallel code. We illustrate this by using examples from our experience in porting industrial codes and in designing parallel numerical libraries. We report in some detail on the parallelization of scientific applications coming from Centre National d'Etudes Spatiales and from Aerospatiale, and we illustrate how it is possible to develop portable and efficient numerical software by considering the parallel solution of sparse linear systems of equations.
Machine learning, especially deep learning, is revolutionizing how many engineering problems are being solved. Three critical ingredients are needed to apply deep machine learning to significant real world problems: i...
详细信息
ISBN:
(纸本)9780769561493
Machine learning, especially deep learning, is revolutionizing how many engineering problems are being solved. Three critical ingredients are needed to apply deep machine learning to significant real world problems: i.) large data sets;ii.) software to implement deep learning and;iii.) significant computing cycles. This paper discusses the state of each ingredient with a specific focus on: a.) how deep learning can apply to large-scale social network analysis and;b.) the computing resources required to make such analyses feasible.
Modeling of complex physical systems with Modelica usually leads to the high-index differential algebraic equation system (DAE), index reduction is an important part of solving the high-index DAE. The structure index ...
详细信息
ISBN:
(纸本)9780769548180
Modeling of complex physical systems with Modelica usually leads to the high-index differential algebraic equation system (DAE), index reduction is an important part of solving the high-index DAE. The structure index reduction algorithm is one of the popular methods, but in special cases, it fails. Combinatorial relaxation algorithm can detect and correct the breakdown situation. And the maximum weight matching of bipartite graph is an important part of the combinatorial relaxation algorithm. In order to choose the proper method for the large-scale, dense bipartite graph, this paper provides three implementations of the Hungarian algorithm. The experiment results and the theory show that the BFS single-augmented method is better than others.
Compressible viscous flows past a space plane have been elucidated by parallel computation on the NWT. The NWT is a vector-parallel architecture computer system which achieves remarkably high performance in processing...
详细信息
Compressible viscous flows past a space plane have been elucidated by parallel computation on the NWT. The NWT is a vector-parallel architecture computer system which achieves remarkably high performance in processing speed and memory storage. We have examined the advantages of the NWT in order to simulate realistic how problems in engineering, such as the investigation of global and local aerodynamic characteristics of a space plane. The accuracy of the computational results has been verified by comparison with experimental data. The simplified domain-decomposition technique introduced here is easy to apply for parallel implementation to significantly improve the acceleration rate of computations. The larger available memory storage enables us to conduct a grid refinement study through which several points concerning CFD simulation of a space plane are obtained.
暂无评论