Object detection and tracking at real time is important and challenging tasks in many computer vision applications such as video robot navigation, surveillance, vehicle navigation, security applications, military appl...
详细信息
ISBN:
(纸本)9781728140421
Object detection and tracking at real time is important and challenging tasks in many computer vision applications such as video robot navigation, surveillance, vehicle navigation, security applications, military applications, patient monitoring system and traffic monitoring system. Object detection includes detecting the object in sequence of videos. In this paper we reviewed the different methods/algorithms for object detection and tracking at real time for high resolution video. Now day's high resolution imaging sensors/cameras is being used in different areas of applications such as security system, in military applications etc. For object detection and tracking in high resolution, greater frame rate requires more time to process a single frame, so it is an extreme challenges for researchers to detect and track target at real time. this sets a demand for fast computational algorithms for real time processing of high resolution videos. Moving object detection and tracking is one of the decisive active areas of research since last decade. In this paper we address and highlight a brief survey or review of various real time object detection and tracking algorithms for high resolution video available in the literature.
In this paper an O(n) parallel algorithm is presented for fast unranking t-ary trees with n internal nodes in Zaks' representation. A sequential O(nt) algorithm is derived on the basis of dynamic programming parad...
详细信息
ISBN:
(纸本)3540219463
In this paper an O(n) parallel algorithm is presented for fast unranking t-ary trees with n internal nodes in Zaks' representation. A sequential O(nt) algorithm is derived on the basis of dynamic programming paradigm. In the parallel version of the algorithm processing is performed in a dedicated parallel architecture containing certain systolic and associative features. At first a coefficient table is created by systolic computations. then, n subsequent elements of a tree codeword is computed in O(1) time through associative search operations.
An embedded pipeline/parallel architecture to support an extended quad-tree algorithm suitable for real-time estimation of the dense disparity map (DDM) for stereoscopic image processing is proposed. the system perfor...
详细信息
ISBN:
(纸本)3540664432
An embedded pipeline/parallel architecture to support an extended quad-tree algorithm suitable for real-time estimation of the dense disparity map (DDM) for stereoscopic image processing is proposed. the system performance has been analyzed by several simulations to qualify the results by both an objective measurement (Mean Square Error) and a subjective assessment (output images). the proposed extended quad-tree is based on the block-matching algorithm, then a fine-grain granularity analysis to estimate the DDM leads us to a systolic array design for the basic Processor Element. this basic design has been utilized to the next levels quad-tree's Processor Elements design.
A versatile family of interconnection networks alternative to hypercubes, called Metacubes, has been proposed for building extremely large scale multiprocessor systems with a small number of links per node. A Metacube...
详细信息
A versatile family of interconnection networks alternative to hypercubes, called Metacubes, has been proposed for building extremely large scale multiprocessor systems with a small number of links per node. A Metacube MC(k, m) connects 2(2km + k) nodes with only k + in links per node. Metacube can be used to build parallel computing systems of very large scale with a small number of links per node. In this paper, we propose a new presentation of Metacube for algorithmic design. Based on the new presentation, we give efficient algorithms for parallel prefix computation and parallel sorting on Metacubes, respectively. the algorithm for prefix computation runs in 2(k)m (k + 1) + k communication steps and 2(k + 1)m + 2k computation steps on MC(k, m). the sort algorithm runs in O(2(k)m + k)(2) computation steps and O(2(k)m (2k + 1) + k)(2) communication steps on MC(k, m).
Irregular and dynamic memory reference patterns can cause performance variations for low level algorithms in general and for parallelalgorithms in particular. We present an adaptive algorithm selection framework whic...
详细信息
ISBN:
(纸本)0769522297
Irregular and dynamic memory reference patterns can cause performance variations for low level algorithms in general and for parallelalgorithms in particular. We present an adaptive algorithm selection framework which can collect and interpret the inputs of a particular instance of a parallel algorithm and select the best performing one from a an existing library. In this paper present the dynamic selection of parallel reduction algorithms. First we introduce a set of high-level parameters that can characterize different parallel reduction algorithms. then we describe an off-line, systematic process to generate predictive models which can be used for run-time algorithm selection. Our experiments show that our framework: (a) selects the most appropriate algorithms in 85% of the cases studied, (b) overall delievers 98% of the optimal performance, (c) adaptively selects the best algorithms for dynamic phases of a running program (resulting in performance improvements otherwise not possible), and (d) adapts to the underlying machine architecture (tested on IBM Regatta and HP V-Class systems).
Soft-core system allows designers to modify the components which are in the architecture they designed conveniently. In some systems, uni-core processor can not provide enough computing power to support a huge amount ...
详细信息
ISBN:
(纸本)9783642131356
Soft-core system allows designers to modify the components which are in the architecture they designed conveniently. In some systems, uni-core processor can not provide enough computing power to support a huge amount of computing for specific applications. In order to improve the performance of a multi-core system, in addition to the hardware architecture design, parallel programming is an important issue. the current parallelizing compilers are hard to parallelize the programs effectively. the programmer must think about how to allot the task to each processor in the beginning. In this paper, we present a software framework for designing parallel program. the proposed framework provides a convenient parallel programming environment for programmers to design the multi-core system's software. From the experiments, the proposed framework can parallelize the program effectively by applying the provided functions.
Internet of things (IoT) devices have produced large data rapidly in recent years. though parallel computing architectures like Map-Reduce have been successful in processing massive data, they are not enough for probl...
详细信息
Recursive programs that typically implement divide-and-conquer algorithms are well-suited for multicore systems, as they offer a high degree of parallelization potential. So far, existing parallelizing compilers have ...
详细信息
ISBN:
(纸本)9781538634370
Recursive programs that typically implement divide-and-conquer algorithms are well-suited for multicore systems, as they offer a high degree of parallelization potential. So far, existing parallelizing compilers have mainly focused on extracting other parallel patterns, such as data or pipeline level parallelism. In this paper, we propose a toolflow for the extraction of recursion level parallelism for embedded multicore systems. To achieve this, the toolflow verifies not only the mutual independence of recursive call-sites, but also selects an appropriate task granularity to ensure a good trade-off between load balancing and parallelization overhead. Profitable parallelization opportunities are implemented by using compiler directives from the OpenMP tasking model. Results show the effectiveness of our toolflow, as it is able to speedup sequential recursive programs between 2.5x and 3.8x on a quad-core platform.
Electronic System level design has an important role in the multi-processor embedded system on chip design. Two important steps in this process are evaluation of a single design configuration and design space explorat...
详细信息
ISBN:
(纸本)9781467387767
Electronic System level design has an important role in the multi-processor embedded system on chip design. Two important steps in this process are evaluation of a single design configuration and design space exploration. In the first part of design process, high-level simple analytical models for application mapping and evaluation are used and modified aiming at accelerating the evaluation of a single design configuration. Using the analytical model the design space is pruned and explored at high speed with low accuracy. In the second part of the design process, two Multi Objective Optimization algorithms based on Particle Swarm Optimization and Simulated Annealing have been proposed to perform design space exploration of the pruned design space with higher accuracy taking advantages of low-level architectural simulation engines. the results obtained by proposed algorithms will provide the designer more accurate solutions within an acceptable time. Considering the MJPEG application as the case study, each of these methods produces a set of near-optimal points. Simulation results show that the proposed methods can lead to near-optimal design configurations with acceptable accuracy in reasonable time.
In this paper we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster the scheduler can take advantage of this network's unique capabilities, includ...
详细信息
ISBN:
(纸本)0769512607
In this paper we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster the scheduler can take advantage of this network's unique capabilities, including a network interface card-based processor and memory and efficient user-level communication libraries. We developed a micro-benchmark to test the scheduler's performance under various aspects of parallel job workloads: memory usage, bandwidth and latency-bound communication, number of processes, timeslice quantum, and multiprogramming levels. Our experiments show that the gang scheduler performs relatively well under most workload conditions, is largely insensitive to the number of concurrent jobs in the system and scales almost linearly with number of nodes. On the other hand, the scheduler is very sensitive to the timeslice quantum, and values under 30 seconds can incur large overheads and fairness problems.
暂无评论