Embedded computing architectures can be designed to meet a variety of application specific requirements. However, optimized hardware can require compiler support to realize the potential of the hardware. this is espec...
详细信息
ISBN:
(纸本)0769526373
Embedded computing architectures can be designed to meet a variety of application specific requirements. However, optimized hardware can require compiler support to realize the potential of the hardware. this is especially true for embedded image processing systems where significant architectural variation is possible, and targeted software can change drastically based on architectural variation. this paper presents methods to compile a single high-level source given a fundamental variation in data-parallel target architectures processor granularity ranging from a single processor to a massively parallel processor array. the approach uses single PPE virtualization, which supports pixel-level data-parallel expressions that operate on a virtual one pixel per processing element (PPE) network and applies pixel-locating transformations to retarget the code into a given target PPE. Unlike mainstream parallel computing techniques, this technique can be applied to lightweight SIMD targets that do not provide global communication hardware or shared memory.
We present the design of the algorithm for constructing the suffix array of a string using manycore GPUs. Despite of the wide usage in text processing and extensive research over two decades there was a lack of effici...
详细信息
ISBN:
(纸本)9783642341083;9783642341090
We present the design of the algorithm for constructing the suffix array of a string using manycore GPUs. Despite of the wide usage in text processing and extensive research over two decades there was a lack of efficient algorithmsthat were able to exploit shared memory parallelism (as multicore CPUs as manycore GPUs) in practice. To the best of our knowledge we developed the first approach exposing shared memory parallelism that significantly outperforms the state-of-the-art existing implementations for sufficiently large inputs. We reduced the suffix array construction problem to a number of parallel primitives such as prefix-sum, radix sorting, random gather and scatter from/to the memory. thus, the performance of the algorithm merely depends on the performance of these primitives on the particular shared memory architecture. We demonstrate its performance on manycore GPUs, but the method can also be applied for other parallelarchitectures, such as multicores, CELL or Intel MIC.
Information processing is a very broad area in which many problems are computationally intensive and thus, they require parallelization and acceleration based on new technologies. the Xilinx Zynq-7000 all programmable...
详细信息
ISBN:
(纸本)9781479941209
Information processing is a very broad area in which many problems are computationally intensive and thus, they require parallelization and acceleration based on new technologies. the Xilinx Zynq-7000 all programmable system-on-chip can be seen as a very adequate platform permitting application-specific software and problem-targeted hardware to be coupled on a single configurable microchip. the tutorial is dedicated to multi-level software/hardware co-design techniques and system architecturesthat combine general-purpose computers, multi-core application-specific processing, and accelerators in reconfigurable hardware with emphasis on broad parallelism. Four projects from the scope of data processing, application informatics, parallelalgorithms (mapped to hardware), and combinatorial search are briefly characterized and will be demonstrated in fully implemented and ready to test projects that include software and reconfigurable hardware linked with on-chip high-performance interfaces. Particular design examples, potential practical applications, experiments and comparisons will be demonstrated.
Techniques for scheduling parallel I/O for both uniprogrammed systems that run single jobs in isolation and multiprogrammed environments that execute multiple parallel jobs simultaneously ate presented. the performanc...
详细信息
ISBN:
(纸本)0769511538
Techniques for scheduling parallel I/O for both uniprogrammed systems that run single jobs in isolation and multiprogrammed environments that execute multiple parallel jobs simultaneously ate presented. the performance of the scheduling algorithms is evaluated on a network of workstations. A new scheduling algorithm proposed in this paper is observed to perform very well for systems running single jobs in isolation. the algorithmsthat use knowledge of job characteristics are observed to produce a superior performance in multiprogrammed parallel environments.
It is presented in this paper that the design and analysis of finite difference domain decomposition algorithms for the two-dimensional heat equation and the numerical results have shown the stability and accuracy of ...
详细信息
ISBN:
(纸本)0769515126
It is presented in this paper that the design and analysis of finite difference domain decomposition algorithms for the two-dimensional heat equation and the numerical results have shown the stability and accuracy of the algorithms. the algorithms in the paper have further extended those developed by Dawson and the others [6].
Architectural synthesis is an efficient design process that reduces the gap between algorithms and architectures by raising the abstraction level. However, this process currently does not take the VLSI circuit interco...
详细信息
ISBN:
(纸本)0780370570
Architectural synthesis is an efficient design process that reduces the gap between algorithms and architectures by raising the abstraction level. However, this process currently does not take the VLSI circuit interconnection cost into account whereas this cost becomes predominant using submicron technologies. In this paper, an interconnection cost analysis at the behavioural level is performed in order to provide rapid prototyping results and to direct the synthesis process with additional path constraints. Results are presented showing the interest of this approach.
this paper describes an architecture dedicated to the real-time processing of census correlation in the context of the realization of passive stereovision sensors. Although DSP circuits have dramatically increased the...
详细信息
ISBN:
(纸本)9781424403127
this paper describes an architecture dedicated to the real-time processing of census correlation in the context of the realization of passive stereovision sensors. Although DSP circuits have dramatically increased their performances in terms of frequency (about 600 MHz today), DSP cores (several Multipliers Accumulators) and pipelines (Super Harvard architectures for example), FPGA circuits remain the best way to design massive parallelarchitectures when ultra fast algorithms computation are needed like it is the case in real time vision systems for collision avoidance.
the concepts of Artifact-as-Organism and Creator-in-a-Box, and their autonomy, adaptation and evolution are proposed as purely engineering motivations for the incorporation of the cognitive attributes of consciousness...
详细信息
ISBN:
(纸本)9781424446421
the concepts of Artifact-as-Organism and Creator-in-a-Box, and their autonomy, adaptation and evolution are proposed as purely engineering motivations for the incorporation of the cognitive attributes of consciousness and self-awareness into robots, automata, machines and artifacts. these ideas are then used to create computational models of cognitive robots and machine consciousness that can be executed using modern parallel, distributed, many core, and massively multi-core, computer architectures.
the availability of real parallelism in multi-core based architectures has resurrected the interest in concurrent computing in general, and parallel computing in particular New languages and libraries have been recent...
详细信息
ISBN:
(纸本)9783642144028
the availability of real parallelism in multi-core based architectures has resurrected the interest in concurrent computing in general, and parallel computing in particular New languages and libraries have been recently proposed to increase productivity in the context of these architectures In this paper we present a novel approach that resorts to the service abstraction for annotating parallelism
Improving the computation efficiency is a key issue in image processing, especially in edge detection, because edge detection is very computationally intensive. Withthe development of real-time application of image p...
详细信息
ISBN:
(纸本)0769515126
Improving the computation efficiency is a key issue in image processing, especially in edge detection, because edge detection is very computationally intensive. Withthe development of real-time application of image processing, fast processing response is becoming more critical. In this paper, a technique for distributed image processing on Spiral Architecture is proposed, which provides a platform for speeding up image processing based on clusters.
暂无评论