Our new architecture, known as Scheduled DataFlow (SDF) system deviates from current trend of building complex hardware to exploit Instruction Level parallelism (ILP) by exploring a simpler, yet powerful execution par...
详细信息
ISBN:
(纸本)0769515126
Our new architecture, known as Scheduled DataFlow (SDF) system deviates from current trend of building complex hardware to exploit Instruction Level parallelism (ILP) by exploring a simpler, yet powerful execution paradigm that is based on dataflow, multithreading and decoupling of memory accesses from execution. A program is partitioned into non-blocking threads. In addition, all memory accesses are decoupled from the thread's execution. Data is pre-loaded into the thread's context (registers), and all results are post-stored after the completion of the thread's execution. Even though multithreading and decoupling are possible with control-flow architecture, the non-blocking and functional nature of the SDF system make it easier to coordinate the memory accesses and execution of a thread. In this paper we show some recent improvements on SDF implementation, whereby threads exchange data directly in register contexts, thus eliminating the need for creating thread frames. thus it is now possible to explore the scalability of our architecture's performance when more register contexts are included on the chip.
In the recent years multimedia technology has emerged as a key technology, mainly because of its ability to represent information in disparate forms as a bit-stream. this enables everything from text to video and soun...
ISBN:
(纸本)3540440496
In the recent years multimedia technology has emerged as a key technology, mainly because of its ability to represent information in disparate forms as a bit-stream. this enables everything from text to video and sound to be stored, processed, and delivered in digital form. A great part of the current research community effort has emphasized the delivery of the data as an important issue of multimedia technology. However, the creation, processing, and management of multimedia forms are the issues most likely to dominate the scientific interest in the long run. the aim to deal with information coming from video, text, and sound will result in a data explosion. this requirement to store, process, and manage large data sets naturally leads to the consideration of programmable parallelprocessing systems as strong candidates in supporting and enabling multimedia technology. therefore, this fact taken together withthe inherent data parallelism in these data types makes multimedia computing a natural application area for parallel and distributed processing. In addition to this, the concepts developed for parallel and distributed algorithms are quite useful for the implementation of distributed multimedia systems and applications. thus, the adaptation of these methods for distributed multimedia systems is an interesting topic to be studied.
this paper introduces PAPA: Packed Arithmetic on a Prefix Adder, a new approach to parallel prefix adder design that supports a wide variety of packed arithmetic computations, including packed add and subtract with sa...
详细信息
ISBN:
(纸本)0769517129
this paper introduces PAPA: Packed Arithmetic on a Prefix Adder, a new approach to parallel prefix adder design that supports a wide variety of packed arithmetic computations, including packed add and subtract with saturation, packed rounded average, and packed absolute difference the approach consists of altering the prefix adder cell logic equations to take advantage of a previously unused "don't care " state. Logical Effort is employed to assess the delay of the new adder architecture by establishing the extra effort needed to select and drive the appropriate carry signal to the requisite sum sub-word. this adder will find applications in video processors and other multimedia-orientated processor chips that implement packed arithmetic operations.
this paper presents tensor product formulas for modeling fault tolerant architectures and their corresponding reconfiguration algorithms. In our approaches, a network topology is first described with simple tensor pro...
详细信息
In most distributed memory computations, node programs are executed on processors according to the owner computes rule. However, the owner computes rule is not best suited for irregular application codes. In irregular...
详细信息
A consensus on a parallel architecture for very large database management has emerged. this architecture is based on a shared-nothing hardware organization. the computation model is very sensitive to skew in tuple dis...
详细信息
Recently there has been a lot of interest in improving the infrastructure used in medical applications. In particular, there is renewed interest on non-invasive, high-resolution diagnostic methods. One such method is ...
详细信息
Recently there has been a lot of interest in improving the infrastructure used in medical applications. In particular, there is renewed interest on non-invasive, high-resolution diagnostic methods. One such method is digital, 3D ultrasound medical imaging. Current state-of-the-art ultrasound systems use specialized hardware for performing advanced processing of input data to improve the quality of the generated images. Such systems are limited in their capabilities by the underlying computing architecture and they tend to be expensive due to the specialized nature of the solutions they employ. Our goal in this work is twofold: (i) To understand the behavior of this class of emerging medical applications in order to provide an efficient parallel implementation and (ii) to introduce a new benchmark for parallel computer architectures from a novel and important class of applications. We address the limitations faced by modern ultrasound systems by investigating how all processing required by advanced beamforming algorithms can be performed on modern clusters of high-end PCs connected with low-latency, high-bandwidth system area networks. We investigate the computational characteristics of a state-of-the-art algorithm and demonstrate that today's commodity architectures are capable of providing almost-real-time performance without compromising image quality significantly.
the proceedings contain 101 papers. the special focus in this conference is on Grid architectures, Load Balancing, Performance Analysis, Prediction, parallel Non-numerical algorithms and parallel Programming. the topi...
ISBN:
(纸本)9783540437925
the proceedings contain 101 papers. the special focus in this conference is on Grid architectures, Load Balancing, Performance Analysis, Prediction, parallel Non-numerical algorithms and parallel Programming. the topics include: Interrupt and cancellation as synchronization methods;applications of virtual data in the LIGO experiment;a parallel system architecture based on dynamically configurable shared memory clusters;simultaneous allocation and scheduling with exclusion and precedence relations algorithm;a greedy approach for a time-dependent scheduling problem;dedicated scheduling of biprocessor tasks to minimize mean flow time;heterogeneous dynamic load balancing with a scheme based on the laplacian polynomial;task scheduling for dynamically configurable multiple SMP clusters based on extended DSC approach;processing time and memory requirements for multi-instalment divisible job processing;estimating execution time of distributed applications;evaluation of parallel programs by measurement of its granularity;the performance of different communication mechanisms and algorithms used for parallelization of molecular dynamics code;benchmarking tertiary storage systems with file fragmentation;fem computations on clusters using different models of parallel programming;parallel skeletons for tabu search method based on search strategies and neighborhood partition;a new parallel approach for multi-dimensional packing problems;three parallelalgorithms for simulated annealing;solving the flow shop problem by parallel simulated annealing;automated verification of infinite state concurrent systems;criteria of satisfiability for homogeneous systems of linear Diophantine constraints and irregular and out-of-core parallel computing on clusters.
We demonstrate the stepwise fabrication of parallel double quantum dots in GaAs/AlGaAs-heterostructures. the atomic force microscope serves as a direct lithographic tool for the processing of our samples, the devices ...
详细信息
We demonstrate the stepwise fabrication of parallel double quantum dots in GaAs/AlGaAs-heterostructures. the atomic force microscope serves as a direct lithographic tool for the processing of our samples, the devices are characterized by transport measurements. Coulomb-blockade oscillations and diamonds with different periods for the two quantum dots are observed. (C) 2002 Elsevier Science B.V, All rights reserved.
暂无评论