This paper presents an efficient strategy to implement parallel and distributed computing for image processing on a neuromorphic platform. We use SpiNNaker, a many-core neuromorphic platform inspired by neural connect...
详细信息
ISBN:
(纸本)9781509052523
This paper presents an efficient strategy to implement parallel and distributed computing for image processing on a neuromorphic platform. We use SpiNNaker, a many-core neuromorphic platform inspired by neural connectivity in the brain, to achieve fast response and low power consumption. Our proposed method is based on fault-tolerant fine-grained parallelism that uses SpiNNaker resources optimally for process pipelining and decoupling. We demonstrate that our method can achieve a performance of up to 49.7 MP/J for Sobel edge detector, and can process 1600 x 1200 pixel images at 697 fps. Using simulated Canny edge detector, our method can achieve a performance of up to 21.4 MP/J. Moreover, the framework can be extended further by using larger SpiNNaker machines. This will be very useful for applications such as energy-aware and time-critical-mission robotics as well as very high resolution computer vision systems.
Hi-PASS, a CAD system for DSP architccturc synthesis, has been developed to automatically producc maximally parallel VLSI designs for real-Time applications. The target DSP application} are the class for which desired...
详细信息
The Camelot Project has constructed a distributed transaction facility intended to support widespread use of transaction processingtechniques. Camelot executes on a variety of uni- and multiprocessors on top of the U...
详细信息
ISBN:
(纸本)0818608285
The Camelot Project has constructed a distributed transaction facility intended to support widespread use of transaction processingtechniques. Camelot executes on a variety of uni- and multiprocessors on top of the Unix-compatible, Mach operating system. The authors describe the design decisions that make Camelot a flexible, easy-to-use system and briefly describe Camelot's programming interfaces and algorithms. They discuss two applications of Camelot: an implementation of distributed ET-1 and a graphical room reservation system that uses the X Window Manager.
Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. Existing algorithms focus on optimizing ...
详细信息
ISBN:
(纸本)9781450317436
Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. Existing algorithms focus on optimizing this problem in a single machine. However, the amount of trajectories exceeds the storage and processing capability of a single machine, and it calls for large-scale trajectory analytics in distributed environments. The distributed trajectory analytics faces challenges of data locality aware partitioning, load balance, easy-to-use interface, and versatility to support various trajectory similarity functions. To address these challenges, we propose a distributed in-memory trajectory analytics system DITA. We propose an effective partitioning method, global index and local index, to address the data locality problem. We devise cost-based techniques to balance the workload. We develop a filter-verification framework to improve the performance. Moreover, DITA can support most of existing similarity functions to quantify the similarity between trajectories. We integrate our framework seamlessly into Spark SQL, and make it support SQL and DataFrame API interfaces. We have conducted extensive experiments on real world datasets, and experimental results show that DITA outperforms existing distributed trajectory similarity search and join approaches significantly.
On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallelapplications using MPI (Message Passing Interface),...
详细信息
ISBN:
(纸本)9781450340809
On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallelapplications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, this work describes techniques to identify suitable task mapping that takes the layout of the allocated nodes as well as the application's communication behavior into account. During the first phase of this research, we instrumented and collected performance data to characterize communication behavior of critical US DOE (United States - Department of Energy) applications using an augmented version of the mpiP tool. Subsequently, we developed several reordering methods (spectral bisection, neighbor join tree etc.) to combine node layout and application communication data for optimized task placement. We developed a tool called mpiAproxy to facilitate detailed evaluation of the various reordering algorithms without requiring full application executions. This work presents a comprehensive performance evaluation (14,000 experiments) of the various task mapping techniques in lowering communication costs on Titan, the leadership class supercomputer at Oak Ridge National Laboratory.
The proceedings contains 274 paper in three volumes. The following topics are dealt with: adaptive and learning systems;cognitive aspects of decision making;knowledge-based systems;telerobotics for space applications;...
详细信息
The proceedings contains 274 paper in three volumes. The following topics are dealt with: adaptive and learning systems;cognitive aspects of decision making;knowledge-based systems;telerobotics for space applications;machine vision;decision support systems;applications of control in large-scale systems;signal and image processing;biological cybernetics;expert systems and fuzzy models;manual control;computer vision;robotic manipulators;distributed data fusion and control;human-computer interfaces;biological and medical systems;supervisory control;and tools and techniques for systems engineering.
In this paper we review the effect of two high-performance techniques for the solution of matrix equations arising in control theory applications on CPU-GPU platforms, in particular advanced optimization via look-ahea...
详细信息
ISBN:
(纸本)9783642369490
In this paper we review the effect of two high-performance techniques for the solution of matrix equations arising in control theory applications on CPU-GPU platforms, in particular advanced optimization via look-ahead and iterative refinement. Our experimental evaluation on the last GPU-generation from NVIDIA, "Kepler", shows the slight advantage of matrix inversion via Gauss-Jordan elimination, when combined with look-ahead, over the traditional LU-based procedure, as well as the clear benefits of using mixed precision and iterative refinement for the solution of Lyapunov equations.
The difficulty for tracking maneuver target is to decide whether a maneuver occurs and when the target begins to maneuver. Since target tracking is a real time problem, the crucial factor is to detect the target maneu...
详细信息
As chip densities and clock rates increases, processors are becoming more susceptible to transient faults that affect program correctness. Therefore, fault tolerance becomes increasingly important in computing system....
详细信息
ISBN:
(纸本)9780769550398
As chip densities and clock rates increases, processors are becoming more susceptible to transient faults that affect program correctness. Therefore, fault tolerance becomes increasingly important in computing system. Two major concerns of fault tolerance techniques are: a) improving system reliability by detecting transient errors and b) reducing performance overhead. In this study, we propose a configurable fault tolerance technique targeting both high reliability and low performance overhead for multi-media applications. The basic principle is applying different levels of fault tolerance configurability, which means that different degrees of fault tolerance are applied to different parts of the source codes in multi-media applications. First, a primary analysis is performed on the source code level to classify the critical statements. Second, a fault injection process combined with a statistical analysis is used to assure the partition with regards to a confidence degree. Finally, checksum-based fault tolerance and instruction duplication are applied to critical statements, while no fault tolerance mechanism is applied to non-critical parts. Performance experiment results demonstrate that our configurable fault tolerance technique can lead to significant performance gains compared with duplicating all instructions. The fault coverage of this scheme is also evaluated. Fault injection results show that about 90% of outputs are application-level correctness with just 20% of runtime overhead.
A key problem in executing performance critical applications on distributed computing environments (e.g. the Grid) is the selection of resources. Research related to "automatic resource selection" aims to al...
详细信息
暂无评论