Edge detection is a fundamental operation in image processing, serving as a crucial step in various applications such as object recognition, image segmentation, and scene understanding. The Sobel edge detection algori...
详细信息
ISBN:
(纸本)9798350344196
Edge detection is a fundamental operation in image processing, serving as a crucial step in various applications such as object recognition, image segmentation, and scene understanding. The Sobel edge detection algorithm has emerged as a widely used method for detecting vertical and horizontal edges in digital images. However, performing edge detection on high-resolution images with large dimensions can be computationally intensive and time-consuming. Specialized hardware solutions such as Field Programmable Gate Arrays (FPGAs) and Coarse-Grained Reconfigurable Arrays (CGRAs) offer significant advantages over general-purpose processors for implementing edge detection algorithms. This paper proposes algorithms for implementing the Sobel edge detection algorithm using two CGRA fabrics: dynamically reconfigurable resource array and distributed memory architecture. Furthermore, we discuss the implementation of Sobel edge detection on the target architecture for an input matrix of arbitrary size. Finally, the proposed approaches were compared with other CGRA-based implementations in terms of latency. The experimental results show that the proposed approaches exhibit significantly lower latency compared to other CGRA-based implementations.
The "a trous" algorithm(1) represents a discrete approach to the classical continous wavelet transform.(2) Similar to the fast wavelet transform(3) the input signal is analyzed by using the coefficients of a...
详细信息
ISBN:
(纸本)0819431168
The "a trous" algorithm(1) represents a discrete approach to the classical continous wavelet transform.(2) Similar to the fast wavelet transform(3) the input signal is analyzed by using the coefficients of a properly chosen low-pass filter, but in contradistinction to the latter there follows no concluding decimation step. Examples of practical applications can be found in the field of cosmology for studying the formation of Large Scale,Structures in the Universe.(4) In this paper we develop parallel algorithms on different MIMD architectures for the two-dimensional "a trous" decomposition. We implement the algorithm on several distributed memory architectures using the PVM (Parallel Virtual Machine) paradigm and on a SGI POWERChallenge using a parallel version of the C programming language (PowerC). Finally we investigate experimental results obtained on both of them.
This paper describes the architecture and operating system, and gives an evaluation of NEC's new parallel computer Cenju-4 Major features of Cenju-4 are: a) parallel memoryarchitecture which encompasses distribut...
详细信息
This paper describes the architecture and operating system, and gives an evaluation of NEC's new parallel computer Cenju-4 Major features of Cenju-4 are: a) parallel memoryarchitecture which encompasses distributed shared memory and user-level inter-processor communication. b) Scalable system from 8 nodes to 1,024 nodes. Using the powerful RISC processor VR10000 (200 MHz) from MIPS II Technologies, Inc., Cenju-4 system can be configured from 8 nodes to 1,024 nodes, flexibly extending the system as the demand arises. c) Utilization of a flexible micro-kernel operating system. Since the system adopts a micro-kernel based operating system (MACH), it can be configured into several software environments such as UNIX double dagger server systems and, single system image systems. The key components of the system are two 1 M gate arrays which implement memory control, inter processor communication control and network communication controls. The programming environment provided are de-facto standard libraries, high-level programming languages such as MPI (Message Passing Interface), PVM (Parallel Virtual Machine) and HPF (High Performance Fortran). The operating system and the inter-processor communication libraries fully exploit the functionality of the hardware to realize an inter-processor communication latency of 4.5 mu s and the throughput of 169 MB/s at user program level.
MPSoCs using a distributed memory architecture generates a large volume of messages that may be classified in application messages, as defined by the application developer, and management messages, used to ensure the ...
详细信息
ISBN:
(纸本)9781509002474
MPSoCs using a distributed memory architecture generates a large volume of messages that may be classified in application messages, as defined by the application developer, and management messages, used to ensure the correct operation of the platform. Both messages classes normally use the same communication infrastructure. Thus, the application traffic can be adversely impacted by the management traffic. Several works observe that different messages classes can be distributed into multiple NoCs, improving the performance and power consumption of the platform. However, these works mainly target shared memory systems. This work suggests the utilization of multiple NoCs in an MPSoC using distributed memory architecture, specializing each network for different message classes. An improvement of up to 40% in the application messages jitter and an average improvement of 5% in the application execution time can be achieved using this strategy.
DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent seque...
详细信息
ISBN:
(纸本)9781467308052
DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload and hence result in low parallel efficiency. We present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributedmemory. Using data sets containing up to several hundred million high-dimensional points, we show that PDSDBSCAN significantly outperforms the master-slave approach, achieving speedups up to 25.97 using 40 cores on shared memoryarchitecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.
In this paper, we consider the problem of evaluating the generic rigidity of an interconnected system in the plane, without a priori knowledge of the network's topological properties. We propose the decentralizati...
详细信息
ISBN:
(纸本)9781467363563
In this paper, we consider the problem of evaluating the generic rigidity of an interconnected system in the plane, without a priori knowledge of the network's topological properties. We propose the decentralization of the pebble game algorithm of Jacobs et. al., an O(n~2) method that determines the generic rigidity of a planar network. Our decentralization is based on asynchronous inter-agent message-passing and a distributed memory architecture, coupled with consensus-based auctions for electing leaders in the system. We provide analysis of the asynchronous messaging structure and its interaction with leader election, and Monte Carlo simulations demonstrating complexity and correctness. Finally, a novel rigidity evaluation and control scenario in the accompanying media illustrates the applicability of our proposed algorithm.
暂无评论