In this paper, a rail recognition scheme is presented to pilot UAV autonomous flying along the track rail. Firstly, the pulse coupled neural network is used to iteratively process the single-channel brightness image a...
详细信息
ISBN:
(数字)9781728170503
ISBN:
(纸本)9781728170510
In this paper, a rail recognition scheme is presented to pilot UAV autonomous flying along the track rail. Firstly, the pulse coupled neural network is used to iteratively process the single-channel brightness image and the binary image of the track contour is obtained. Here, the image entropy is adopted as the judgment basis for the stop of iterative processing. Then, the third order Bézier curve is used to fit the contour, and the vanishing point is obtained. Based on the vanishing point and the inverse perspective transformation method, the local flight target point is calculated. Then the yaw and pitch are calculated based on the vanishing point for the next step of inverse perspective transformation to calculate the local flight target point. Finally, the angle between flight direction and target point is calculated to pilot the UAV flying along the track. To speed the processing, CUDA based parallel programming in NVIDIA TX2 is adopted. In the end, various scene images, including forward flight, side flight, height change of UAV and the influence of invasion objects, are used to test the effective of the scheme presented in this paper. Experiments show that the rail recognition rate is 96.08% and the false alarm rate is 1.80%.
HPC systems having accelerator attached to it is the new normal. However, programming these accelerators to get good performance is very complex and tedious. Hence, directive based programming such as OpenMP and OpenA...
详细信息
ISBN:
(数字)9781728192192
ISBN:
(纸本)9781728192208
HPC systems having accelerator attached to it is the new normal. However, programming these accelerators to get good performance is very complex and tedious. Hence, directive based programming such as OpenMP and OpenACC are gaining wide popularity for parallel programming. They simplify the programming experience by abstracting the low-level complexities from the user. In this paper, we have done an extensive comparison of OpenMP 4.5 and OpenACC for GPU programming. Performance comparison of these two APIs on NVIDIA Tesla GPUs namely, P100 and V100 has also been captured. Data Transfer times, Kernel Execution times, Total Execution times and Performance portability are the criteria for comparison. The challenges faced while parallelizing the applications using the directives thus leading to improper outputs has also been dotted.
The popularization of parallelism is arguably the most fundamental computing challenge for years to come. We present an approach where parallel programming takes place in a restricted (sub-Turing-complete), logic-base...
详细信息
ISBN:
(纸本)9783642310577;9783642310560
The popularization of parallelism is arguably the most fundamental computing challenge for years to come. We present an approach where parallel programming takes place in a restricted (sub-Turing-complete), logic-based declarative language, embedded in Java. Our logic-based language, PQL, can express the parallel elements of a computing task, while regular Java code captures sequential elements. This approach offers a key property: the purely declarative nature of our language allows for aggressive optimization, in much the same way that relational queries are optimized by a database engine. At the same time, declarative queries can operate on plain Java data, extending patterns such as map-reduce to arbitrary levels of nesting and composition complexity. We have implemented PQL as extension to a Java compiler and showcase its expressiveness as well as its scalability compared to competitive techniques for similar tasks (Java + relational queries, in-memory Hadoop, etc.).
parallel programming is an important issue for current multi-core processors and necessary for new generations of many-core architectures. This includes processors, computers, and clusters. However, the introduction o...
详细信息
ISBN:
(纸本)9781467313513
parallel programming is an important issue for current multi-core processors and necessary for new generations of many-core architectures. This includes processors, computers, and clusters. However, the introduction of parallel programming in undergraduate courses demands new efforts to prepare students for this new reality. This paper describes an experiment on a traditional Computer Science course during a two-year period. The main focus is the question of when to introduce parallel programming models in order to improve the quality of learning. The goal is to propose a method of introducing parallel programming based on OpenMP (a shared-variable model) and MPI (a message-passing model). Results show that when the OpenMP model is introduced before the MPI model the best results are achieved. The main contribution of this paper is the proposed method that correlates several concepts such as concurrency, parallelism, speedup, and scalability to improve student motivation and learning.
We propose a novel synchronization mechanism called versioning. It dynamically establishes a deterministic order of memory accesses in parallel programs that have serial semantics, in a way that is transparent to the ...
详细信息
ISBN:
(纸本)9781467308243;9781467308267
We propose a novel synchronization mechanism called versioning. It dynamically establishes a deterministic order of memory accesses in parallel programs that have serial semantics, in a way that is transparent to the program m er. This order is created in a distributed manner and is enforced by monitoring memory accesses and stalling threads if necessary. Versioning gives rise to parallel programming models in which programmers need not explicitly synchronize threads and only need to specify shared data, which greatly simplifies parallel programming. However, versioning introduces overheads and thus demands architectural support. We describe versioning and the architectural support it needs. We also propose one parallel programming model that utilizes versioning and use it to parallelize 13 benchmark applications. We build an FPGA prototype of a multiprocessor system with versioning support and show that good parallel speedups are obtained. Our analysis shows minimal impact of versioning, both in terms of timing over-heads and in terms of additional hardware.
We propose a new theoretical model for parallelism. The model is explictly based on data and work distributions, a feature missing from other theoretical models. The major theoretic result is that data movement can th...
详细信息
ISBN:
(纸本)9789881925169
We propose a new theoretical model for parallelism. The model is explictly based on data and work distributions, a feature missing from other theoretical models. The major theoretic result is that data movement can then be derived by formal reasoning. While the model has an immediate interpretation in distributed memory parallelism, we show that it can also accomodate shared memory and hybrid architectures such as clusters with accelerators. The model gives rise in a natural way to objects appearing in widely different parallel programming systems such as the PETSc library or the Quark task scheduler. Thus we argue that the model offers the prospect of a high productivity programming system that can be compiled down to proven high-performance environments.
The first Spanish parallel programming Contest was organized in September 2011 within the Spanish Jornadas de Paralelismo. The aim of the contest is to disseminate parallelism among Computer Science students. The webs...
详细信息
ISBN:
(纸本)9780769546766
The first Spanish parallel programming Contest was organized in September 2011 within the Spanish Jornadas de Paralelismo. The aim of the contest is to disseminate parallelism among Computer Science students. The website and the material generated can be used for educational purposes. This paper comments on the organization of the contest and summarizes some training activities in which the material of the contest is being or can be used.
Viewshed analysis is a common GIS capability used in various domains with various requirements. In avionics, viewshed analysis is a part of accuracy critical applications and the real time operating systems in embedde...
详细信息
Viewshed analysis is a common GIS capability used in various domains with various requirements. In avionics, viewshed analysis is a part of accuracy critical applications and the real time operating systems in embedded devices use preemptive scheduling algorithms to satisfy performance requirements. Therefore, to effectively benefit from the viewshed analysis, a method should be both fast and accurate. Although R3 algorithm is accepted as an accuracy benchmark, R2 algorithm with lower accuracy is preferred in many cases due to its better execution time performance. This thesis prioritizes accuracy and presents an alternative approach to improve execution time performance of the R3 algorithm. Considering different execution environments, improved versions of R3 are implemented for CPU and GPU. The experiment results show that CPU implementation of improved algorithms achieve 1.23x to 13.51x speedup depending on the observer altitude, range and topology of the terrain. In GPU implementation experiments up to 2.27x speedup is recorded. In addition to execution time performance improvements, the analysis results prove that proposed algorithms are capable of providing higher accuracy like R3.
Undergraduate or novice programmers are often challenged by higher-level and abstract concepts in programming courses. Compared to constructing a sequential program, parallel and concurrent programming requires a diff...
详细信息
ISBN:
(纸本)9781450362597
Undergraduate or novice programmers are often challenged by higher-level and abstract concepts in programming courses. Compared to constructing a sequential program, parallel and concurrent programming requires a different and more complex mental model of control flow. Now that multi-core processors have become the norm for computers and mobile devices, the responsibility of developing software to take advantage of this extra computing power now rests with the modern software developer. In recognition of this new era, curricula guidelines have been proposed specifically targeting the complex world of parallel and distributed computing. CS2013 also recognizes this with a dedicated parallel and Distributed Computing knowledge area with core hours, as well as dispersing parallelism concepts across other fundamental knowledge areas. parallel programming was once considered an advanced area of computing, and only taught to students by experts in graduate-level elective courses. However, it is now expected that all undergraduate computing students will become familiar with the fundamentals of parallelism. Concurrency and parallelism concepts are undoubtedly difficult for students to learn. This can even be daunting for teachers that are inexperienced with all elements of the underlying parallelism concepts, but even more daunting is devising pedagogically-sound materials that will allow undergraduate students to grasp the concepts. This is especially challenging for early undergraduate courses where students are often novice programmers, barely confident in sequential programming let alone parallel programming. This session will provide an opportunity for instructors to discuss and share ideas and experiences in this area, as well as explore potential collaboration opportunities.
In this paper we specifically present a parallel solution to finding the one-ring neighboring nodes and elements for each vertex in generic meshes. The finding of nodal neighbors is computationally straightforward but...
详细信息
In this paper we specifically present a parallel solution to finding the one-ring neighboring nodes and elements for each vertex in generic meshes. The finding of nodal neighbors is computationally straightforward but expensive for large meshes. To improve the efficiency, the parallelism is adopted by utilizing the modern Graphics Processing Unit (GPU). The presented parallel solution is heavily dependent on the parallel sorting, scan, and reduction. Our parallel solution is efficient and easy to implement, but requires the allocation of large device memory. Our parallel solution can generate the speedups of approximately 55 and 90 over the serial solution when finding the neighboring nodes and elements, respectively. It is easy to implement due to the reason it does not need to perform the mesh-coloring before finding neighbors There are no complex data structures, only integer arrays are needed, which makes our parallel solution very effective. (C) 2020 The Author(s). Published by Elsevier B.V.
暂无评论