In this paper, we present teaching experiences of offering a course on large-scale parallel computing using message passing interface (MPI). This particular course was offered to under-graduates and graduates for the ...
详细信息
ISBN:
(数字)9781728148946
ISBN:
(纸本)9781728148953
In this paper, we present teaching experiences of offering a course on large-scale parallel computing using message passing interface (MPI). This particular course was offered to under-graduates and graduates for the first time in the Department of Computer Science and Engineering, Indian Institute of Technology Kanpur in a very long time. We will present what topics were covered, how we decided the course content, the class demographics, what resources were made available for the students to run their MPI jobs and discuss the output of the course. We will also discuss what were the stumbling blocks encountered while offering a parallel computing systems course without much support from teaching assistants, and some lessons that we took forward to the next time we offered this course.
We present an implementation of the Jaccard Index for graphs on the Migratory Memory-Side Processing Emu architecture. This index was designed to find similarities between different vertices in a graph, and is often u...
详细信息
ISBN:
(纸本)9781538659892
We present an implementation of the Jaccard Index for graphs on the Migratory Memory-Side Processing Emu architecture. This index was designed to find similarities between different vertices in a graph, and is often used to identify communities. The Emu architecture is a parallel system based on a partitioned global address space, with threads automatically migrating inside the memory. We introduce the parallel programming model used to exploit it, detail our implementation of the algorithm, and analyze simulated performance results as well as early hardware tests. We discuss its application to large scale problems.
Low latency is a fundamental requirement for Virtual Reality (VR) systems to reduce the potential risks of cybersickness and to increase effectiveness, efficiency and user experience. In contrast to the effects of uni...
详细信息
ISBN:
(纸本)9781538633656
Low latency is a fundamental requirement for Virtual Reality (VR) systems to reduce the potential risks of cybersickness and to increase effectiveness, efficiency and user experience. In contrast to the effects of uniform latency degradation, the influence of latency jitter on user experience in VR is not well researched, although today's consumer VR systems are vulnerable in this respect. In this work we report on the impact of latency jitter on cybersickness in HMD-based VR environments. Test subjects are given a search task in Virtual Reality, provoking both head rotation and translation. One group experienced artificially added latency jitter in the tracking data of their head-mounted display. The introduced jitter pattern was a replication of a real-world latency behavior extracted and analyzed from an existing example VR-system. The effects of the introduced latency jitter were measured based on self-reports simulator sickness questionnaire (SSQ) and by taking physiological measurements. We found a significant increase in self-reported simulator sickness. We therefore argue that measure and control of latency based on average values taken at a few time intervals is not enough to assure a required timeliness behavior but that latency jitter needs to be considered when designing experiences for Virtual Reality.
Accurate background subtraction is an essential tool for high level computer vision applications. However, as research continues to increase the accuracy of background subtraction algorithms, computational efficiency ...
详细信息
ISBN:
(数字)9783030038014
ISBN:
(纸本)9783030038014;9783030038007
Accurate background subtraction is an essential tool for high level computer vision applications. However, as research continues to increase the accuracy of background subtraction algorithms, computational efficiency has often suffered as a result of increased complexity. Consequentially, many sophisticated algorithms are unable to maintain real-time speeds with increasingly high resolution video inputs. To combat this unfortunate reality, we propose to exploit the inherently parallelizable nature of background subtraction algorithms by making use of NVIDIA's parallel computing platform known as CUDA. By using the CUDA interface to execute parallel tasks in the Graphics Processing Unit (GPU), we are able to achieve up to a two orders of magnitude speed up over traditional techniques. Moreover, the proposed GPU algorithm achieves over 8x speed over its CPU-based background subtraction implementation proposed in our previous work [1].
DEPSO-Scout is a hybrid optimization algorithm combining Differential Evolution (DE), Particle Swarm Optimization (PSO) and Artificial Bee Colony (ABC). The solution convergence is balanced between exploration of PSO ...
详细信息
ISBN:
(纸本)9781538623176
DEPSO-Scout is a hybrid optimization algorithm combining Differential Evolution (DE), Particle Swarm Optimization (PSO) and Artificial Bee Colony (ABC). The solution convergence is balanced between exploration of PSO and exploitation from DE. The suboptimal solution has reduced by the scout bee property of ABC. DEPSO-Scout outperforms traditional DE, PSO, and ABC. However, in a higher dimension of search space, the accuracy of DEPSO-Scout is maintained while the search speed is significantly decreased. From the experiment, the computational time varies depending on the complexity of the problem. To improve the time-performance of DEPSO-Scout, the parallelization techniques becomes an interest. By modifying the DEPSO-Scout algorithm with the parallel approach, the speed of algorithm significantly improved while the correctness of solutions is maintained. The experiment and analysis of speedup and algorithm efficiency are discussed. The improvement opportunity of parallel DEPSO-Scout is also discussed in the last section.
作者:
Zafari, AfshinUppsala Univ
Div Comp Sci Dept Informat Technol Lagerhyddsvagen 2 S-75237 Uppsala Sweden
Task based parallel programming has shown competitive outcomes in many aspects of parallel programming such as efficiency, performance, productivity and scalability. Different approaches are used by different software...
详细信息
ISBN:
(纸本)9783319780245;9783319780238
Task based parallel programming has shown competitive outcomes in many aspects of parallel programming such as efficiency, performance, productivity and scalability. Different approaches are used by different software development frameworks to provide these outcomes to the programmer, while making the underlying hardware architecture transparent to her. However, since programs are not portable between these frameworks, using one framework or the other is still a vital decision by the programmer whose concerns are expandability, adaptivity, maintainability and interoperability of the programs. In this work, we propose a unified programming interface that a programmer can use for working with different task based parallel frameworks transparently. In this approach we abstract the common concepts of task based parallel programming and provide them to the programmer in a single programming interface uniformly for all frameworks. We have tested the interface by running programs which implement matrix operations within frameworks that are optimized for shared and distributed memory architectures and accelerators, while the cooperation between frameworks is configured externally with no need to modify the programs. Further possible extensions of the interface and future potential research are also described.
In this paper, we present a new distributed algorithm for minimizing a sum of non-necessarily differentiable convex functions composed with arbitrary linear operators. The overall cost function is assumed strongly con...
详细信息
ISBN:
(纸本)9781479970612
In this paper, we present a new distributed algorithm for minimizing a sum of non-necessarily differentiable convex functions composed with arbitrary linear operators. The overall cost function is assumed strongly convex. Each involved function is associated with a node of a hypergraph having the ability to communicate with neighboring nodes sharing the same hyperedge. Our algorithm relies on a primal-dual splitting strategy with established convergence guarantees. We show how it can be efficiently implemented to take full advantage of a multicore architecture. The good numerical performance of the proposed approach is illustrated in a problem of video sequence denoising, where a significant speedup is achieved.
In the paper we present parallel implementations as well as execution times and speed-ups of three different algorithms run in various environments such as on a workstation with multi-core CPUs and a cluster. The para...
详细信息
ISBN:
(纸本)9783319672205;9783319672199
In the paper we present parallel implementations as well as execution times and speed-ups of three different algorithms run in various environments such as on a workstation with multi-core CPUs and a cluster. The parallel codes, implementing the master-slave model in C+MPI, differ in computation to communication ratios. The considered problems include: a genetic algorithm with various ratios of master processing time to communication and fitness evaluation times, matrix multiplication and numerical integration. We present how the codes scale in the aforementioned systems. For the numerical integration code that scales very well we also show performance in a hybrid CPU+Xeon Phi environment.
ParaSail is a language specifically designed to simplify the construction of programs that make full, safe use of parallel hardware even while manipulating potentially irregular data structures. As parallel hardware h...
详细信息
In the framework of Network Function Virtualization (NFV), we address in this work the performance analysis of virtualized network functions (VNFs), wherein the virtualization of the radio access network (namely, Clou...
详细信息
In the framework of Network Function Virtualization (NFV), we address in this work the performance analysis of virtualized network functions (VNFs), wherein the virtualization of the radio access network (namely, Cloud-RAN) is the driving use-case. The overarching principle of network virtualization consists of replacing network functions, which were so far running on dedicated and proprietary hardware, with open software applications running on shared general purpose servers. The complexity of virtualization is in the softwarization of low-layer network functions (namely, PHY functions) because their execution must meet strict latency requirements. Throughout this work, we evaluate the performance of VNFs in terms of latency which consid- ers the total amount of time that is required to process VNFs in cloud computing systems. We notably investigate the relevance of resource pooling and statistical multiplexing when available cores in a data center are shared by all active VNFs. We perform VNF modeling by means of stochastic service systems. Proposed queuing models reveal the behavior of high performance computing architectures based on parallel processing and enable dimensioning the required com- puting capacity in data centers.
暂无评论