This article presents approaches for performance improvements into two large and well-known open source projects, Git and GNU Compiler Collection, using parallel programming. We share the difficulties faced and the st...
详细信息
This article presents approaches for performance improvements into two large and well-known open source projects, Git and GNU Compiler Collection, using parallel programming. We share the difficulties faced and the strategies used, concluding with a set of lessons learned that are useful to similar parallelization processes.
Auto-parallelizing compilers for embedded applications have been unsuccessful due to the widespread use of pointer arithmetic and the complex memory model of multiple-address space digital signal processors (DSPs). Th...
详细信息
Auto-parallelizing compilers for embedded applications have been unsuccessful due to the widespread use of pointer arithmetic and the complex memory model of multiple-address space digital signal processors (DSPs). This paper develops, for the first time, a complete auto-parallelization approach, which overcomes these issues. It first combines a pointer conversion technique with a new modulo elimination transformation for program recovery enabling later parallelization stages. Next, it integrates a novel data transformation technique that exposes the processor location of partitioned data. When this is combined with a new address resolution mechanism, it generates efficient programs that run on multiple address spaces without using message passing. Furthermore, as DSPs do not possess any data cache structure, an optimization is presented which transforms the program to both exploit remote data locality and local memory bandwidth. This parallelization approach is applied to the DSPstone and UTDSP benchmark suites, giving an average speedup of 3.78 on four Analog Devices TigerSHARC TS-101 processors.
In this paper, we investigate parallel implementation techniques for network coding. It is known that network coding is useful for both wired and wireless networks and it also mitigates peer/piece selection problems i...
详细信息
In this paper, we investigate parallel implementation techniques for network coding. It is known that network coding is useful for both wired and wireless networks and it also mitigates peer/piece selection problems in P2P file sharing systems. However, due to the decoding complexity of network coding, there have been concerns about adoption of network coding in practical network systems and to improve the decoding performance, the exploitation of parallelism has been proposed previously. In this paper, we argue that naive parallelization strategies of network coding may result in unbalanced workload distribution, and thus, limiting performance improvements. We further argue that a higher performance enhancement can be achieved through balanced partitioning methods in parallelized network coding and propose new parallelization techniques for network coding. Our experiments show that on a quad-core processor system, proposed algorithms exhibit up to 5.69 speedup which is better than the linear speedup with the influence of additional cache. Moreover, on an octal-core system, our algorithms even achieve speedup of 8.46 compared to a sequential network coding and 43.3 percent faster than an existing parallelized technique using 1 Mbytes data with 1,024 x 1,024 coefficient matrix size.
Due to their heavy reliance on server infrastructure, the current computation-intensive SaaS suffer from scalability issues compared to the existing data-intensive commercial SaaS. Offloading certain computations to t...
详细信息
Due to their heavy reliance on server infrastructure, the current computation-intensive SaaS suffer from scalability issues compared to the existing data-intensive commercial SaaS. Offloading certain computations to the client browser can resolve these scalability issues but the current Browser APIs are complex to use and integrate in a single software. We propose in this paper four high level APIs that harness existing browser-based paradigms and proven software architectures to reduce the complexity of parallel computing and device-agnostic interactive rendering in the web browser. To allow experimental results, we have developed a proof-of-concept browser-based and interactive diffusion MRI where we have particularly deployed a parallel Diffusion Tensor Estimation. Our platform provides us easy APIs achieving up to 4 times speedup in parallel computation and real-time interactive rendering performances across different Browsers and Devices in comparison to other existing solutions in the Diffusion MRI Community.
This paper investigates the automatic parallelization of a heuristic for an NP-complete problem, with machine learning. The objective is to automatically design a new concurrent algorithm that finds solutions of compa...
详细信息
ISBN:
(纸本)9781479914159
This paper investigates the automatic parallelization of a heuristic for an NP-complete problem, with machine learning. The objective is to automatically design a new concurrent algorithm that finds solutions of comparable quality to the original heuristic. Our approach, called Savant, is inspired from the Savant syndrome. Its concurrency model is based on map-reduce. The approach is evaluated with the well-known Min-Min heuristic. Simulation results on two problem sizes are promising, the produced algorithm is able to find solutions of comparable quality.
This paper investigates the automatic generation of a Map-Reduce program, which implements a heuristic for an NP-complete problem with machine learning. The objective is to automatically design a new concurrent algori...
详细信息
This paper investigates the automatic generation of a Map-Reduce program, which implements a heuristic for an NP-complete problem with machine learning. The objective is to automatically design a new concurrent algorithm that finds solutions of comparable quality to the original heuristic. Our approach, called Savant, is inspired from the savant syndrome. Its concurrency model is based on Map-Reduce. The approach is evaluated with the well-known Min-Min heuristic. Experimental results on two problem sizes are promising, the produced algorithm is able to find solutions of comparable quality.
暂无评论