Scientific parallel programming has become mainstream in recent years by the introduction of high-performance graphics processing units (GPUs) that are specifically designed for numerical processing. In addition, free...
详细信息
Scientific parallel programming has become mainstream in recent years by the introduction of high-performance graphics processing units (GPUs) that are specifically designed for numerical processing. In addition, freely available programming tools have made it possible for anyone who wants to leverage the processing power of GPUs to do so relatively easily. This article provides an introduction to parallel programming using GPUs with numerical examples demonstrating the speedup that can be obtained in a microwave engineering problem. All programming tools that are used in the article can be obtained free-of-charge from online resources. This accessibility is a tremendous benefit to engineers, students, and enthusiasts.
Due to the current scenario in which multi-core architectures are predominant in most personal computers and servers, the knowledge of parallel programming content becomes fundamental for computer students to develop ...
详细信息
Due to the current scenario in which multi-core architectures are predominant in most personal computers and servers, the knowledge of parallel programming content becomes fundamental for computer students to develop software capable of obtaining the best performance of these architectures. Considering the importance of this context, this paper presents the results of a systematic mapping of the literature related to the teaching-learning process of parallel programming in the computing programmes in three important databases: ACM, IEEE and Science Direct. The results obtained showed that in order to solve the challenges and differences found in teaching-learning parallel programming, reorganization is necessary in the undergraduate programmes. A standard for parallel programming teaching is important. This can be established by defining where and how to insert parallelism in the courses, adopting a methodology to teach the contents of parallelism in several different courses, beginning the study in the first year. The main languages, libraries, difficulties encountered and methods of classroom and distance teaching for parallel programming are presented in this paper. Distance learning is still little explored in this area of knowledge, but it can support the teaching and study of these contents.
With Cloud Computing emerging as a promising new approach for ad-hoc parallel data processing, major companies have started to integrate frameworks for parallel data processing in their product portfolio, making it ea...
详细信息
With Cloud Computing emerging as a promising new approach for ad-hoc parallel data processing, major companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. We have entered the Era of Big Data. The explosion and profusion of available data in a wide range of application domains rise up new challenges and opportunities in a plethora of disciplines-ranging from science and engineering to biology and business. One major challenge is how to take advantage of the unprecedented scale of data-typically of heterogeneous nature-in order to acquire further insights and knowledge for improving the quality of the offered services. To exploit this new resource, we need to scale up and scale out both our infrastructures and standard techniques. Our society is already data-rich, but the question remains whether or not we have the conceptual tools to handle it. In this paper we discuss and analyze opportunities and challenges for efficient parallel data processing. Big Data is the next frontier for innovation, competition, and productivity, and many solutions continue to appear, partly supported by the considerable enthusiasm around the MapReduce paradigm for large-scale data analysis. We review various parallel and distributed programming paradigms, analyzing how they fit into the Big Data era, and present modern emerging paradigms and frameworks. To better support practitioners interesting in this domain, we end with an analysis of on-going research challenges towards the truly fourth generation data-intensive science.
As we reach the technological limits of hardware improvement, we must rely on multiple processors to improve programming speed. parallel programming tools are limited, making effective parallel programming difficult a...
详细信息
As we reach the technological limits of hardware improvement, we must rely on multiple processors to improve programming speed. parallel programming tools are limited, making effective parallel programming difficult and cumbersome. Compilers that translate conventional sequential programs into parallel form would liberate programmers from the complexities of explicit, machine-oriented parallel programming. Polaris, an experimental translator of conventional Fort-ran programs that target machines such as the Cray T3D, is the first step toward this goal. The most important techniques implemented in Polaris resulted from a study of the effectiveness of commercial Fortran parallelizers. The authors compiled the Perfect Benchmarks, a collection of conventional Fortran programs representing the typical workload of high-performance computers, for the Alliant FX/80, an eight-processor multiprocessor popular in the late 1980s. For each program, they measured the quality of the parallelization by computing the speedup. With few exceptions, the Alliant Fortran compiler failed to deliver any significant speedup for the majority of the programs. The compiler failed to produce a speedup because it could not parallelize some of the most important loops in the Perfect Benchmarks. The study showed that extending the four most important analysis and transformation techniques traditionally used for vectorization leads to significant increases in speedup. Polaris detected much of the parallelism available in the set of benchmark codes. A careful analysis of the remaining loops that Polaris could parallelize highlights four areas for improvement.
Bioinformatics allows and encourages the application of many different parallel programming approaches. This special issue brings together high-quality state-of-the-art contributions about parallel programming in bioi...
详细信息
Bioinformatics allows and encourages the application of many different parallel programming approaches. This special issue brings together high-quality state-of-the-art contributions about parallel programming in bioinformatics, from some interesting points of view or perspectives. The special issue collects considerably extended and improved versions of the best papers, accepted and presented in PBio 2017 (5th International Workshop on parallelism in Bioinformatics, and part of ICA3PP 2017). The domains and topics covered in these 2 papers are timely and important, and the authors have done an excellent job of presenting the material.
A design pattern is a description of a high-quality solution to a frequently occurring problem in some domain. A pattern language is a collection of design patterns that are carefully organized to embody a design meth...
详细信息
AbstractThis paper presents a parallel implementation in APL of an algorithm to set up a database for the KRK‐endgame in chess. It shows clearly the techniques necessary to achieve the parallelism and thereby proves ...
详细信息
AbstractThis paper presents a parallel implementation in APL of an algorithm to set up a database for the KRK‐endgame in chess. It shows clearly the techniques necessary to achieve the parallelism and thereby proves that APL can be a valuable productivity increasing aid in this kind of Artificial Intelligence (AI)‐research. The complete APL‐functions are given in the *** reversed pigeon hole and bit map techniques are used. Move generation is table driven with a new technique to cater for the blockage of sliding pieces like a rook. In order to maintain parallelism ‘if’ statements are avoided and extensive use is made of compression and identity
A visit to the neighborhood PC retail store provides ample proof that we are in the multi-core era. The key differentiator among manufacturers today is the number of cores that they pack onto a single chip. The clock ...
详细信息
A visit to the neighborhood PC retail store provides ample proof that we are in the multi-core era. The key differentiator among manufacturers today is the number of cores that they pack onto a single chip. The clock frequency of commodity processors has reached its limit, however, and is likely to stay below 4 GHz for years to come. As a result, adding cores is not synonymous with increasing computational power. To take full advantage of the performance enhancements offered by the new multi-core hardware, a corresponding shift must take place in the software infrastructure - a shift to parallel computing.
The MultiFlex system is an application-to-platform mapping tool that integrates heterogeneous parallel components - H/W or S/W - into a homogeneous platform programming environment. This leads to higher quality design...
详细信息
The MultiFlex system is an application-to-platform mapping tool that integrates heterogeneous parallel components - H/W or S/W - into a homogeneous platform programming environment. This leads to higher quality designs through encapsulation and abstraction. Two high-level parallel programming models are supported by the following MultiFlex platform mapping tools: a distributed system object component (DSOC) object-oriented message passing model and a symmetrical multiprocessing (SMP) model using shared memory. We demonstrate the combined use of the MultiFlex multiprocessor mapping tools, supported by high-speed hardware-assisted messaging, context-switching, and dynamic scheduling using the StepNP demonstrator multiprocessor system-on-chip platform, for two representative applications: 1) an Internet traffic management application running at 2.5 Gb/s and 2) an MPEG4 video encoder (VGA resolution, at 30 frames/s). For these applications, a combination of the DSOC and SMP programming models were used in interoperable fashion. After optimization and mapping, processor utilization rates of 85%-91% were demonstrated for the traffic manager. For the MPEG4 decoder, the average processor utilization was 88%.
暂无评论