the distributed data service library allows developers to control the distribution of data in a parallel program simply by setting attributes on a distributed data object. this interface provides the power of a data p...
详细信息
the proceedings contain 118 papers. the special focus in this conference is on parallel and Distributed processing. the topics include: Dynamic reconfiguration of a PMMLA for high-throughput applications;a parallel al...
ISBN:
(纸本)3540643591
the proceedings contain 118 papers. the special focus in this conference is on parallel and Distributed processing. the topics include: Dynamic reconfiguration of a PMMLA for high-throughput applications;a parallel algorithm for minimum cost path computation on polymorphic processor array;a performance modeling and analysis environment for reconfigurable computers;an integrated partitioning and synthesis system for dynamically reconfigurabte multi-FPGA architectures;temporal partioning for partially-reconfigurable-field-programmable gate;a java development and runtime environment for reconfigurable computing;synthesizing reconfigurable sequential machines using tabular models;evaluation of a low-power reconfigurable DSP architecture;a reconfigurable hardware-monitor for communication analysis in distributed real-time systems;a mathematical benefit analysis of context switching reconfigurable computing;a configurable computing approach towards real-time target tracking;hardware reconfigurable neural networks;a simulator for the reconfigurable mesh architecture;processor architectures for circuit emulation;an empirical comparison of runtime systems for conservative parallel simulation;synchronizing operations on multiple objects;migration and rollback transparency for arbitrary distributed applications in workstation clusters;a topology based approach to coordinated multicast operations;a parallel evolutionary algorithm for the vehicle routing problem with heterogeneous fleet;artificial neural networks on reconfigurable meshes;a molecular quasi-random model of computations applied to evaluate collective intelligence;replicated shared object model for edge detection with spiral architecture and scheduling tasks of a parallel program in two-processor systems with use of cellular automata.
this paper presents an integrated design system called SPARCS (Synthesis and Partitioning for Adaptive Reconfigurable Computing Systems) for automatically partitioning and synthesizing designs for reconfigurable board...
详细信息
ISBN:
(纸本)3540643591
this paper presents an integrated design system called SPARCS (Synthesis and Partitioning for Adaptive Reconfigurable Computing Systems) for automatically partitioning and synthesizing designs for reconfigurable boards with multiple field-programmable devices (FPGAs). the SPARCS system accepts design specifications at the behavior level, in the form of task graphs. the system contains a temporal partitioning tool to temporally divide and schedule the tasks on the reconfigurable architecture, a spatial partitioning tool to map the tasks to individual FPGAs, and a high-level synthesis tool to synthesize efficient register-transfer level designs for each set of tasks destined to be downloaded on each FPGA. Commercial logic and layout synthesis tools are used to complete logic synthesis, placement, and routing for each FPGA design segment. A distinguishing feature of the SPARCS system is the tight integration of the partitioning and synthesis tools to accurately predict and control design performance and resource utilizations. this paper presents an overview of SPARCS and the various algorithms used in the system, along with a brief description of how a JPEG-like image compression algorithm is mapped to a multi-FPGA board using SPARCS.
A concept for a future integer arithmetic unit as well as a first implementation of the arithmetic unit9;s core as smart pixel detector chip is presented. this architecture is well-suited for a realization with 3-D...
详细信息
ISBN:
(纸本)0818685727
A concept for a future integer arithmetic unit as well as a first implementation of the arithmetic unit's core as smart pixel detector chip is presented. this architecture is well-suited for a realization with 3-D optoelectronic very large scale integrated (VLSI) circuits. Due to the use of optical interconnections running vertically to the circuit's surface no pin limitation is given. this allows massively parallelism and a higher throughput performance than in all-electronic solutions. To exploit the potential of optical interconnections in VLSI systems efficiently well-adapted low-level algorithms and architectures have to be developed. this is demonstrated for a pipelined arithmetic unit using a redundant number representation. A gate layout for the optoelectronic circuits is given as well as a specification for the necessary optical interconnection scheme linking the circuits with free-space optics. It is shown that the throughput can be increased by a factor of 10 to 50 compared to current all-electronic processors by considering state-of-the-art optical and optoelectronic technolgy.
this paper presents an approach using a mixture of connectionist experts for the identification of complex Arabic phonetic features such as the emphasis, the gemination and the relevant duration of vowels. these exper...
详细信息
High Performance Fortran (HPF) is the de facto standard language for writing data parallel programs. In case of applications that use indirect addressing on distributed arrays, HPF compilers have limited capabilities ...
详细信息
ISBN:
(纸本)3540649522
High Performance Fortran (HPF) is the de facto standard language for writing data parallel programs. In case of applications that use indirect addressing on distributed arrays, HPF compilers have limited capabilities for optimizing such codes on distributed memory architectures, especially for optimizing communication and reusing communication schedules between subroutine boundaries. this paper describes a dynamic approach for optimizing unstructured communication in codes with indirect addressing. the basic idea is that runtime data reflecting the communication patterns will be reused if possible. the user has only to specify which data in the program has to be traced for modifications. the experiments and results show the effectiveness of the chosen approach.
An overview on VLSI architectures for multimedia processing is given. Dedicated as well as programmable approaches are discussed. Dedicated implementations are derived utilizing specific properties of the target algor...
详细信息
this paper discusses the main achievements of the EPIC project, whose aim was to design a high level programming environment with an associated implementation for portable parallel image processing. the project was fu...
详细信息
ISBN:
(纸本)3540649522
this paper discusses the main achievements of the EPIC project, whose aim was to design a high level programming environment with an associated implementation for portable parallel image processing. the project was funded as part of the EPSRC Portable Software Tools for parallelarchitectures (PSTPA) programme. the paper summarises new portable programming abstractions for image processing, and outlines the automatically optimising implementation which achieves portability of application code and efficiency of implementation on a closely coupled distributed memory parallel system. the paper includes timings for optimised and unoptimised versions of typical image processingalgorithms;it draws the main conclusion that it is possible to achieve portability with efficiency, for a specific application, by adopting a high level algebraic programming model, together with a transformation-based optimiser which reclaims the loss of efficiency which an algebraic approach traditionally entails.
Instruction scheduling methods based on the construction of state diagrams (or automata) have been used for architectures involving deeply pipelined function units. However, the size of the state diagram is prohibitiv...
详细信息
ISBN:
(纸本)0818684046
Instruction scheduling methods based on the construction of state diagrams (or automata) have been used for architectures involving deeply pipelined function units. However, the size of the state diagram is prohibitively large, resulting in high execution time and space requirement. In this paper, we present a simple method for reducing the size of the state diagram by recognizing unique paths of a state diagram. Our experiments show that the number of paths in the reduced state diagram is significantly lower - by 1 to 3 orders of magnitude - compared to the number of paths in the original state diagram. Using the reduced MS-state diagrams, we develop an efficient software pipelining method. the proposed software pipelining algorithm produced efficient schedules and performed better than Huff's Slack Scheduling method, and the original Co-scheduling method, in terms of boththe initiation interval (II) and the time taken to construct the schedule.
this paper presents algorithms and architectures for implementing from 1-D to multidimensional M-D digital nonrecursive filters. these architectures are very regular and support single chip implementation in VLSI, as ...
详细信息
暂无评论