This paper deals with the issue of developing efficient algorithms for accelerating SIFT (Scale Invariant Feature Transform) features extraction under distributed environment. The proposed distributed dynamic parallel...
详细信息
The paper describes-from a software engineering perspective-a framework for the formal development of parallel algorithms on arbitrary architectures. The algorithms are synthesised in a transformational way, i.e. by a...
详细信息
This paper will describe some recent attempts to construct transportable numerical software for high-performance computers. Restructuring algorithms in terms of simple linear algebra modules is reviewed. This techniqu...
详细信息
This paper will describe some recent attempts to construct transportable numerical software for high-performance computers. Restructuring algorithms in terms of simple linear algebra modules is reviewed. This technique has proved very succesful in obtaining a high level of transportability without severe loss of performance on a wide variety of both vector and parallel computers. The use of modules to encapsulate parallelism and reduce the ratio of data movement to floating-point operations has been demonstrably effective for regular problems such as those found in dense linear algebra. In other situations it may be necessary to express explicitly parallel algorithms. We also present a programming methodology that is useful for constructing new parallel algorithms which require sophisticated synchronization at a large grain level. We describe the SCHEDULE package which provides an environment for developing and analyzing explicitly parallel programs in FORTRAN which are portable. This package now includes a preprocessor to achieve complete portability of user level code and also a graphics post processor for performance analysis and debugging. We discuss details of porting both the SCHEDULE package and user code. Examples from linear algebra, and partial differential equations are used to illustrate the utility of this approach.
Despite the facts that multicore CPUs are present in virtually every personal computer or cell phone and distributed systems in the form of cloud services are steadily penetrating various domains of our lives, only a ...
详细信息
ISBN:
(纸本)9781538655559
Despite the facts that multicore CPUs are present in virtually every personal computer or cell phone and distributed systems in the form of cloud services are steadily penetrating various domains of our lives, only a minority of programmers and computer science graduates are able to effectively design and develop parallel and distributed applications. Serial thinking is natural to all humans and it is also encouraged by many computer science curricula. Even though that leading educational institutions are attempting to rectify this trend by introducing parallelprogramming courses into their study programs, these courses are often dedicated for more experienced students in their fourth of fifth year since mastering modern parallel technologies like OpenMP or CUDA requires certain level of programming skills. It can be argued, that the parallel thinking should be taught much sooner, perhaps even before tertiary education. To this end, we have created an educational platform Parapple that aims to introduce parallelism and related problems like load balancing or synchronization to inexperienced programmers in an entertaining form. Our platform is web-based, so it can run in any modern browser on all operating systems without installation and the users are required to have only a very basic understanding of structural imperative programming.
According to the characteristics of multi-core architectures and binary storage property of integer sequence, this paper proposes an efficient thread-level parallel algorithm for sorting integer sequence on multi-core...
详细信息
The hardware complexity of modern machines makes the design of adequate programming models crucial for jointly ensuring performance, portability, and productivity in high-performance computing (HPC). Sequential task-b...
详细信息
ISBN:
(纸本)9781665497473
The hardware complexity of modern machines makes the design of adequate programming models crucial for jointly ensuring performance, portability, and productivity in high-performance computing (HPC). Sequential task-based programming models paired with advanced runtime systems allow the programmer to write a sequential algorithm independently of the hardware architecture in a productive and portable manner, and let a third party software layer -the runtime system- deal with the burden of scheduling a correct, parallel execution of that algorithm to ensure performance. Many HPC algorithms have successfully been implemented following this paradigm, as a testimony of its effectiveness. Developing algorithms that specifically require fine-grained tasks along this model is still considered prohibitive, however, due to per-task management overhead [1], forcing the programmer to resort to a less abstract, and hence more complex "task+X" model. We thus investigate the possibility to offer a tailored execution model, trading dynamic mapping for efficiency by using a decentralized, conservative in-order execution of the task flow, while preserving the benefits of relying on the sequential taskbased programming model. We propose a formal specification of the execution model as well as a prototype implementation, which we assess on a shared-memory multicore architecture with several synthetic workloads. The results show that under the condition of a proper task mapping supplied by the programmer, the pressure on the runtime system is significantly reduced and the execution of fine-grained task flows is much more efficient.
The speed of calculating, tracking and filling the isolines has a direct impact on the performance of user interaction. In this paper, we begin with the serial algorithm of visualization and implement its parallel alg...
详细信息
Dynamic storage allocation is a vital component of programming systems intended for multiprocessor architectures that support globally shared memory. Highly parallel algorithms for access to system data structures lie...
详细信息
Dynamic storage allocation is a vital component of programming systems intended for multiprocessor architectures that support globally shared memory. Highly parallel algorithms for access to system data structures lie at the core of effective memory allocation strategies as well as solutions to other parallel systems problems. In this paper, we investigate four algorithms, all based on the first fit approach, that provide different granularities of parallel access to the allocator's data structures. These solutions employ a variety of design techniques including specialized locking protocols, the use of atomic fetch-and-Ф operations, and structural modifications. We describe experiments designed to compare the performance of these schemes. The results show that simple algorithms are appropriate when the expected number of concurrent requests per memory is low and the request pattern is not bursty. algorithms that support finer granularity access while avoiding locking protocols are successful in a range of larger processor/memory ratios.
parallelprogramming has been the subject of deep research for decades - and renowned in the software community as a difficult challenge to the degree that many companies have teams of parallelism and concurrency expe...
详细信息
ISBN:
(纸本)9781595936028
parallelprogramming has been the subject of deep research for decades - and renowned in the software community as a difficult challenge to the degree that many companies have teams of parallelism and concurrency experts. Further, many ISV's explicitly design their software architectures so as to ensure that the majority of the development effort, including of course debug and test, can be done without consideration of parallelism. What makes parallelism so difficult, are the knotty and coupled problems of correctness, performance - particularly data locality, and software modularity. In Terascale (manycore) chip-level multiprocessors, we are facing a pervasive and critical parallelprogramming challenge. Core counts on a single chip are expected to increase rapidly, progressing with Moore's law, and quad-core systems are already available today in mainstream volume client and server platforms. To continue the rapid performance scaling to which we have become accustomed, applications will need to exhibit ample parallelism (and increasing amounts of it) for Successive generations of hardware. Further, because the move to Multiple-core parallelism as the primary basis for performance improvement is pervasive, this requirement falls on a wide range of applications including traditional large-scale commercial and HPC server, desktop, laptop, and even those running on small mobile devices. That breadth has numerous implications for the types of solutions that are required. We will discuss some of the requirements for Terascale parallelprogramming solutions, and point Out several potentially fruitful directions. A number of these solutions will build on mainstream programming approaches (objects, modularity, imperative), particularly introducing parallelism with modest disruption to both large-scale and local-scale program structure. However, there is an opportunity for radically different approaches to take hold in the mainstream (e.g. functional). On the hardware front
Voronoi mesh as the basic primitive in the field of computational geometry has been widely applied on solving fluid-related problem by finite volume method. By analyzing various limited conditions for reasonable point...
详细信息
暂无评论