A method was developed as a response to the need for a robust optimizer for two different image vision algorithms. Our new approach is a synthesis of two modalities, a simulated annealing technique paired with a paral...
详细信息
A method was developed as a response to the need for a robust optimizer for two different image vision algorithms. Our new approach is a synthesis of two modalities, a simulated annealing technique paired with a parallel search. Using an initial value and the maximum parametric variations, our method searches for a cost function minimum by using parameter subspaces combined with a local simulated annealing algorithm step. the description of our two computer vision applications which relate object tracking to video sequences make it clear why this kind of optimizer was needed. In the cases described, our method provides encouraging results.
VLIW machines possibly provide the most direct way to exploit instruction level parallelism;however, they cannot be used to emulate current general-purpose instruction set architectures. Programs scheduled for a parti...
详细信息
VLIW machines possibly provide the most direct way to exploit instruction level parallelism;however, they cannot be used to emulate current general-purpose instruction set architectures. Programs scheduled for a particular implementation of a VLIW model cannot be guaranteed to be binary compatible with other implementations of the same machine model with different number of functional-units. this paper describes an architecture, named dynamically trace scheduled VLIW (DTSVLIW), which can be used to implement machines that execute code of current RISC or CISC instruction set architectures in a VLIW fashion, with backward code compatibility. Some preliminary performance measurements of the DTSVLIW, obtained with an execution-driven simulator running the SPECint95 benchmark suite, are also presented.
the purpose of this paper is to present a very efficient parallel algorithm for computing the convex hull in the plane. We propose a parallel version of the Jarvis's march, realized using the BSP model and which t...
详细信息
the purpose of this paper is to present a very efficient parallel algorithm for computing the convex hull in the plane. We propose a parallel version of the Jarvis's march, realized using the BSP model and which takes O(nh/p) time (where p is the number of processors and n is the problem size) against the O(nh) complexity of the sequential algorithm. Furthermore, we give the experimental results obtained testing the algorithm implementation on a 16-node IBM SP2 (Scalable POWER parallel 2) and we compare them withthe theoretical performance prediction obtained using the BSP cost calculus model.
this paper describes a parallel algorithm for the Euclidean distance transform on a special-purpose architecture based on a reconfigurable mesh interconnection network. the proposed architecture, which supports the Eu...
详细信息
this paper describes a parallel algorithm for the Euclidean distance transform on a special-purpose architecture based on a reconfigurable mesh interconnection network. the proposed architecture, which supports the Euclidean distance transform algorithm as well as other low-level image processingalgorithms, is particularly interesting because it can be effectively implemented in hardware and it can be programmed at a high level. the Euclidean distance transform algorithm described in this paper exploits the specific features of the reconfigurable interconnection network of the proposed dedicated architecture and takes advantage of the natural matching both between the data structure of the problem (a mesh of pixels) and that of the dedicated architecture (a mesh of processing elements) and between the nature of the computation (distance computation) and the capability of the interconnection network to let information flow from one node to a set of nodes by means of reconfigurable buses. the proposed algorithm has been implemented and has been validated through simulation, its computational complexity is O(N) (worst case) for pictures of N/spl times/N pixels on an architecture with N/spl times/N processing elements.
Clustering and scheduling of tasks for parallel implementation is a well researched problem. Several techniques have been presented in the literature to improve performance and reduce problem execution times. In this ...
详细信息
Clustering and scheduling of tasks for parallel implementation is a well researched problem. Several techniques have been presented in the literature to improve performance and reduce problem execution times. In this paper we present a novel approach where clustering and scheduling of tasks can be tuned to achieve maximal speedup or efficiency. the proposed scheme is based on the relation between the costs of computation and communication of task clusters. In this paper, we show how clustering can be adapted to suit different architectures and number of available processors. the proposed efficient clustering and scheduling algorithm is flexible: the clustering and scheduling can be tuned to suit bounded or unbounded number of processors and/or parallel computing environment. Comparative studies indicate superior efficiency compared to most other schemes proposed in recent years.
Most parallel programming models for distributed-memory architectures are based on individual threads interacting via send and receive operations. We show that a more structured model, BSP, gains substantial performan...
详细信息
Most parallel programming models for distributed-memory architectures are based on individual threads interacting via send and receive operations. We show that a more structured model, BSP, gains substantial performance improvements by exploiting the extra information implicit in its structure. In particular, each thread learns something about global state whenever it receives a message. this information can be used to modify its own behavior to improve collective use of the communication system. the programming model's semantics also provides implicit knowledge that can be exploited to increase performance. We show that these effects are useful at the application level by comparing the performance of BSP and MPI implementations of the NAS parallel benchmarks.
Most parallel programming models for distributed-memory architectures are based on individual threads interacting via send and receive operations. We show that a more structured model, BSP, gains substantial performan...
详细信息
Most parallel programming models for distributed-memory architectures are based on individual threads interacting via send and receive operations. We show that a more structured model, BSP, gains substantial performance improvements by exploiting the extra information implicit in its structure. In particular each thread learns something about global state whenever it receives a message. this information can be used to modify its own behavior to improve collective use of the communication system. the programming model's semantics also provides implicit knowledge that can be exploited to increase performance. We show that these effects are useful at the application level by comparing the performance of BSP and MPI implementations of the NAS parallel benchmarks.
the proceedings contain 103 papers. the special focus in this conference is on Object-Orientation and Fundamentals for Applications. the topics include: On tractable queries and constraints;dynamic relationships in ob...
ISBN:
(纸本)3540664483
the proceedings contain 103 papers. the special focus in this conference is on Object-Orientation and Fundamentals for Applications. the topics include: On tractable queries and constraints;dynamic relationships in object oriented databases;an intelligent object-oriented database architecture;implementation of a generic query processor;hybrid simultaneous scheduling and mapping in sql multi-query parallelization;cluster-based database selection techniques for routing bibliographic queries;developing patterns as a mechanism for assisting the management of knowledge in the context of conducting organisational change;knowledge acquisition for mobile robot environment mapping;knowledge discovery withthe associative memory modell neunet;tracking mobile users utilizing their frequently visited locations;a parallel signature file technique using vertical partitioning and extendable hashing;flexible workflow management systems;object-based ordered delivery of messages in object-based systems;storage and retrieval of xml documents using object-relational databases;a knowledge based approach for modeling and querying multidimensional databases;an incremental hypercube approach for finding best matches for vague queries;formalising ontologies and their relations;addressing efficiency issues during the process of integrity maintenance;quality and recommendation of multi-source data for assisting technological intelligence applications;using self-organizing maps to organize document archives and to characterize subject matters;from object- oriented conceptual modeling to component-based development;query processing in relationlog;a query subsumption technique;a conference key multicasting scheme using knapsack and secret sharing;verify updating trigger correctness;a flexible weighting scheme for multimedia documents and supporting teams in virtual organizations.
this paper presents the hardware architectures of two texture features: mean and contrast. these features are based on the co-occurrence matrix method. However the features can be calculated without the co-occurrence ...
详细信息
this paper presents the hardware architectures of two texture features: mean and contrast. these features are based on the co-occurrence matrix method. However the features can be calculated without the co-occurrence matrix, too. the formalism behind the features without the co-occurrence matrix is shown, and the corresponding hardware architectures are depicted with data flow graphs (DFG). the architecture was developed withthe very high speed integrated circuit Hardware Description Language (VHDL) and commercially available logic synthesis tool by Synopsys. the VHDL code was synthesized to Xilinx XC4000-series FPGA library.
DSP processor growth is phenomenal and continues to grow rapidly, but general-purpose microprocessors have entered the multimedia and signal processing oriented stream by adding DSP functionality to the instruction se...
详细信息
暂无评论