One of the barriers to the adoption of parallel computing is the inherent complexity of its programming. The Open Multi-Processing (OpenMP) Application programming Interface (API) facilitates such implementations, pro...
详细信息
Modern parallel platforms, such as clouds or servers, are often shared among many different jobs. However, existing parallel programming runtime systems are designed and optimized for running a single parallel job, so...
详细信息
Modern parallel platforms, such as clouds or servers, are often shared among many different jobs. However, existing parallel programming runtime systems are designed and optimized for running a single parallel job, so it is generally hard to directly use them to schedule multiple parallel jobs without incurring high overhead and inefficiency. In this work, we develop AMCilk (Adaptive Multiprogrammed Cilk), a novel runtime system framework, designed to support multiprogrammed parallel workloads. AMCilk has client-server architecture where users can dynamically submit parallel jobs to the system. AMCilk has a single runtime system that runs these jobs while dynamically reallocating cores, last-level cache, and memory bandwidth among these jobs according to the scheduling policy. AMCilk exposes the interface to the system designer, which allows the designer to easily build different scheduling policies meeting the requirements of various application scenarios and performance metrics, while AMCilk transparently (to designers) enforces the scheduling policy. The primary feature of AMCilk is the low-overhead and responsive preemption mechanism that allows fast reallocation of cores between jobs. Our empirical evaluation indicates that AMCilk incurs small overheads and provides significant benefits on application-specific criteria for a set of 4 practical applications due to its fast and low-overhead core reallocation mechanism.
NASA Technical Reports Server (Ntrs) 19890012171: parallel Solution of Sparse One-Dimensional Dynamic programming Problems by NASA Technical Reports Server (Ntrs); published by
NASA Technical Reports Server (Ntrs) 19890012171: parallel Solution of Sparse One-Dimensional Dynamic programming Problems by NASA Technical Reports Server (Ntrs); published by
Biometrics with facial recognition is now widely used. A face identification system should identify not only someone's faces but also detect spoofing attempts with printed face or digital presentations. A sincere ...
详细信息
ISBN:
(数字)9781728173566
ISBN:
(纸本)9781728173573
Biometrics with facial recognition is now widely used. A face identification system should identify not only someone's faces but also detect spoofing attempts with printed face or digital presentations. A sincere spoofing prevention approach is to examine face liveness, such as eye blinking and lips movement. Nevertheless, this approach is helpless when dealing with video-based replay attacks. For this reason, this paper proposes a combined method of face liveness detection and CNN (Convolutional Neural Network) classifier. The anti-spoofing method is designed with two modules, the blinking eye module that evaluates eye openness and lip movement, and the CCN classifier module. The dataset for training our CNN classification can be from a variety of publicly available sources. We combined these two modules sequentially and implemented them into a simple facial recognition application using the Android platform. The test results show that the module created can recognize various kinds of facial spoof attacks, such as using posters, masks, or smartphones.
The work focuses on the application of Fragmented programming approach to automated generation of a parallel programs for solving applied numerical problems. A new parallel programming system LuNA-ICLU applying this a...
详细信息
This research full paper identifies how the teaching of parallel computing has been developing over the years. The learning of parallel and distributed computing is fundamental for computing professionals, due to the ...
详细信息
ISBN:
(数字)9781728189611
ISBN:
(纸本)9781728189628
This research full paper identifies how the teaching of parallel computing has been developing over the years. The learning of parallel and distributed computing is fundamental for computing professionals, due to the popularization of parallel architectures. Teaching parallel computing involves theoretical concepts and the development of practical skills. Its content is dense and comprises different disciplines in computer courses. Although there is growing concerned about this type of teaching, the organization and depth of parallel computing teaching at universities change widely. The available literature on the teaching of parallel computing shows some experiences about how to teach parallel computing; however, it is not easy to determine the state of the art with challenges and gaps. Our objective is to identify essential aspects related to the teaching of parallel computing as methodologies, supporting resources, subjects taught, the satisfaction of students with learning and curricula. We carried out a systematic mapping to extract information from the literature, which is composed of three phases: planning, conduction, and reporting. We initially selected 819 papers from the Scopus, IEEE, ACM, and Google Scholar databases. After a previous analysis, we performed a full read of 94 papers. The use of different teaching methodologies appears in the publications, however, the traditional teaching methodology still is the most used. There is a small number of students in parallel computing courses, a concern of different authors. Educational software or hardware resources are reported, with software proposals corresponding to most of them. The teaching of parallel computing at the beginning of undergraduate courses appear in different papers. This paper contributes to research in teaching parallel computing, pointing out the state of the art of this area, highlighting challenges that should be the focus of investigations.
Real-time data processing is one of the central processes of particle physics experiments which require large computing resources. The LHCb (Large Hadron Collider beauty) experiment will be upgraded to cope with a par...
详细信息
Remote Sensing (RS) data processing is characterized by massive remote sensing images and increasing amount of algorithms of higher complexity. parallel programming for data-intensive applications like massive remote ...
详细信息
ISBN:
(纸本)9781467324229
Remote Sensing (RS) data processing is characterized by massive remote sensing images and increasing amount of algorithms of higher complexity. parallel programming for data-intensive applications like massive remote sensing image processing on parallel systems is bound to be especially trivial and challenging. We propose a C++ template mechanism enabled generic parallel programming skeleton for these remote sensing applications in high performance clusters. It provides both programming templates for distributed RS data and generic parallel skeletons for RS algorithms. Through one-side communication primitives provided by MPI, the distributed RS data template could provide a global view of the big RS data whose sliced data blocks are scattered among the distributed memory of cluster nodes. Moreover, by data serialization and RMA (Remote Memory Access), the data templates could also offer a simple and effective way to distribute and communicate massive remote sensing data with complex data structures. Furthermore, the generic parallel skeletons implement the recurring patterns of computation, performance optimization and pass the user-defined sequential functions as parameters of templates for type genericity. With the implemented skeletons, Developers without extensive parallel computing technologies can implement efficient parallel remote sensing programs without concerning for parallel computing details. Through experiments on remote sensing applications, we confirmed that our templates were productive and efficient.
A cloud parallel programming system CPPS being under development at the Institute of Informatics Systems is aimed to be an interactive visual environment of functional and parallel programming for supporting of comput...
详细信息
ISBN:
(数字)9781728166957
ISBN:
(纸本)9781728166964
A cloud parallel programming system CPPS being under development at the Institute of Informatics Systems is aimed to be an interactive visual environment of functional and parallel programming for supporting of computer science teaching and learning. The system will support the development, verification and debugging of architecture-independent parallel Cloud Sisal programs and their correct conversion into efficient code of parallel computing systems for its execution in clouds. In the paper, methods and tools of the CPPS system intended for formal verification of Cloud Sisal programs are described.
Deadlock is an increasingly pressing concern as the multicore revolution forces parallel programming upon the average programmer. Existing approaches to deadlock impose onerous burdens on developers, entail high runti...
详细信息
ISBN:
(纸本)9781931971652
Deadlock is an increasingly pressing concern as the multicore revolution forces parallel programming upon the average programmer. Existing approaches to deadlock impose onerous burdens on developers, entail high runtime performance overheads, or offer no help for unmodified legacy code. Gadara automates dynamic deadlock avoidance for conventional multithreaded programs. It employs whole-program static analysis to model programs, and Discrete Control Theory to synthesize lightweight, decentralized, highly concurrent logic that controls them at runtime. Gadara is safe, and can be applied to legacy code with modest programmer effort. Gadara is efficient because it performs expensive deadlock-avoidance computations offline rather than online. We have implemented Gadara for C/Pthreads programs. In benchmark tests, Gadara successfully avoids injected deadlock faults, imposes negligible to modest performance overheads (at most 18%), and outperforms a software transactional memory system. Tests on a real application show that Gadara identifies and avoids both previously known and unknown deadlocks while adding performance overheads ranging from negligible to 10%.
暂无评论