Both the size and the resolution of images always were key topics in the graphical computing area. Especially, they become more and more relevant in the big data era. We can observe that often a huge amount of data is...
详细信息
Both the size and the resolution of images always were key topics in the graphical computing area. Especially, they become more and more relevant in the big data era. We can observe that often a huge amount of data is exchanged by medium/low bandwidth networks or yet, they need to be stored on devices with limited space of memory. In this context, the present paper shows the use of the Fractal method for image compression. It is a lossy method known by providing higher indexes of file reduction through a highly time consuming phase. In this way, we developed a model of parallel application for exploiting the power of multiprocessor architectures in order to get the Fractal method advantages in a feasible time. The evaluation was done with different-sized images as well as by using two types of machines, one with two and another with four cores. The results demonstrated that both the speedup and efficiency are highly dependent of the number of cores. They emphasized that a large number of threads does not always represent a better performance.
Both the size and the resolution of images always were key topics in the graphical computing ***,they become more and more relevant in the big data *** can observe that often a huge amount of data is exchanged by medi...
详细信息
Both the size and the resolution of images always were key topics in the graphical computing ***,they become more and more relevant in the big data *** can observe that often a huge amount of data is exchanged by medium/low bandwidth networks or yet,they need to be stored on devices with limited space of *** this context,the present paper shows the use of the Fractal method for image *** is a lossy method known by providing higher indexes of file reduction through a highly time consuming *** this way,we developed a model of parallel application for exploiting the power of multiprocessor architectures in order to get the Fractal method advantages in a feasible *** evaluation was done with different-sized images as well as by using two types of machines,one with two and another with four *** results demonstrated that both the speedup and efficiency are highly dependent of the number of *** emphasized that a large number of threads does not always represent a better performance.
The Process Networks (PNs) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platf...
详细信息
The Process Networks (PNs) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is a very difficult and highly error-prone task. To overcome the associated difficulties, we have developed the pn compiler, which derives specific Polyhedral Process Networks (PPN) parallel specifications from sequential static affine nested loop programs (SANLPs). However, there are many applications, for example, multimedia applications (MPEG coders/decoders, smart cameras, etc.) that have adaptive and dynamic behavior which cannot be expressed as SANLPs. Therefore, in order to handle dynamic multimedia applications, in this article we address the important question whether we can relax some of the restrictions of the SANLPs while keeping the ability to perform compile-time analysis and to derive PPNs. Achieving this would significantly extend the range of applications that can be parallelized in an automated way. The main contribution of this article is a first approach for automated translation of affine nested loop programs with dynamic loop bounds into input-output equivalent Polyhedral Process Networks. In addition, we present a method for analyzing the execution overhead introduced in the PPNs derived from programs with dynamic loop bounds. The presented automated translation approach has been evaluated by deriving a PPN parallel specification from a real-life application called Low Speed Obstacle Detection (LSOD) used in the smart cameras domain. By executing the derived PPN, we have obtained results which indicate that the approach we present in this article facilitates efficient parallel implementations of sequential nested loop programs with dynamic loop bounds. That is, our approach reveals the possible parallelism available in such applications, which allows for the utili
The supercomputers built with processors based on CBEA architecture are relatively new and there are fewer applications optimized for this type of system. In this article, we propose to use computing resources provide...
详细信息
ISBN:
(纸本)9781467322423;9789786627113
The supercomputers built with processors based on CBEA architecture are relatively new and there are fewer applications optimized for this type of system. In this article, we propose to use computing resources provided by a CBEA-based cluster by parallelization and optimization of an algorithm for classification of a large dataset. In order to analyze the proposed methods of parallelization, the algorithm is executed on a CBEA-based cluster, which contains 96 PowerXCell 8i processors (with theoretical peak performance of 9.83TFlops). We analyze the execution time on a processor and on all 96 processors, with and without utilization of the SPE cores.
The use of cloud computing to offer High Performance Computing (HPC) services has been widely discussed in the academia and industry. In this respect, this dissertation is included in the context of designing a cloud ...
详细信息
The use of cloud computing to offer High Performance Computing (HPC) services has been widely discussed in the academia and industry. In this respect, this dissertation is included in the context of designing a cloud computing platform for the development of component-based parallel computing applications, referred as cloud of components. Many important challenges about using the cloud of components relate to parallel programming, an error-prone task due to synchronization issues, which may lead to abortion and production of incorrect data during execution of applications, and the inefficient use of computational resources. These problems may be very relevant in the case of long running applications with tight timelines to obtain critical results, quite common in the context of HPC. One possible solution to these problems is the formal analysis of the behavior of the components of an application through the cloud services, before their execution. Thus, the users of the components may know if a component can be safely used in their application. In this scenario, formal methods becomes useful. In this dissertation, it is proposed a process for specification and derivation of parallel components implementation for the cloud of components. This process involves the formal specification of the components behavior through contracts described using the Circus formal specification language. Then, through a refinement and translation process, which takes the contract as a start point, one may produce an implementation of a component that may execute on a parallel computing platform. Through this process, it becomes possible to offer guarantees to developers about the components behavior in their applications. To validate the proposed idea, the process is applied to contracts that have been described based on two benchmarks belonging to the NAS parallel Benchmarks, widely adopted in HPC for evaluate the performance of parallel programming and computing platforms.
Creating a unified catalogue of patterns is challenging in any domain. Difficultly lies in representing relationships between patterns, compounded by natural growth as new patterns are discovered. Existing pattern lan...
详细信息
ISBN:
(纸本)9781424451661
Creating a unified catalogue of patterns is challenging in any domain. Difficultly lies in representing relationships between patterns, compounded by natural growth as new patterns are discovered. Existing pattern languages successfully describe relationships in small collections of patterns, but this approach lacks a systematic process that will scale to a growing catalogue of patterns. RIPPL (Relationship Initiated Pervasive Pattern Language) structures patterns and tensions in their tradeoffs and facilitates comparison and composition in terms of domain specific constraints. A case study applying the proposed methodology to two existing pervasive pattern languages reveals the ability to represent pattern relationships in a structured, systematic form that can scale across individual pattern languages.
Techniques like zero-copy and operating system bypass can decrease communication latency and increase bandwidth. Smaller latencies and greater bandwidths contribute for better performance in parallel applications and ...
详细信息
Techniques like zero-copy and operating system bypass can decrease communication latency and increase bandwidth. Smaller latencies and greater bandwidths contribute for better performance in parallel applications and became them more scalables as well. Communication protocols using these techiniques are known as user-level communication protocols. Based on experiences from another research groups implementing communication libraries and parallel programming libraries over VIA and experience from GPPD implementing DECK, the text presents the implementation of DECK primitives over VIA standard, which is classified as an user-level protocol. The goal of this master’s thesis is implement DECK over VIA avoiding any intermediate copy between the data source and destination, reaching zero-copy. DECK/VIA is the unique library among all libriaries over VIA here studied totally free of intermediate copies, although a synchronous behavior was forced to keep this compromise. VI-GM, an implementation of VIA for Myrinet networks was used to implement DECK/VIA library. The implementation of DECK/VIA has shown a one-way latency of 86.85 μs and a maximum bandwidth of 205 Mbytes/s, 82% of nominal bandwidth of Myrinet network. To validate the library, the FT application from NPB was executed. Their results were compared with the results obtained with DECK/GM, for Myrinet networks and DECK/TCP, for Ethernet networks. Even with one additional software layer and doing all communication using a handshake, DECK/VIA reaches speedup values very closer of DECK/GMand DECK/TCP on Gigabit Ethernet and was better than DECK/TCP on Fast Ethernet. When implementing parallel programming libraries, we concluded the ideal solution is that meets the good balance between the quest for performance and the keeping of original library’s semantics. This work contibutes with a survey of communication libraries development, their problems and their solutions, which can guide others researchers performing the
Tato práce se zabývá paralelizací v jazyce Rust. Cílem této práce je zhodnotit výkon a použitelnost jazyka Rust pro tvorbu paralelních aplikací ve srovnání s...
详细信息
Tato práce se zabývá paralelizací v jazyce Rust. Cílem této práce je zhodnotit výkon a použitelnost jazyka Rust pro tvorbu paralelních aplikací ve srovnání s již používanou alternativou - OpenMP. Toto porovnání bylo provedeno na výpočtu n-rozměrné konvoluce. V závěru se nachází zhodnocení výsledků a návrhy pro jejich další využití.
暂无评论