In this paper, we propose a multi-paradigm and multi-grain parallel execution model based on SMP-Cluster, which integrates coarse grain, mid grain and fine grain parallelism. Multiple paradigms supported by our model ...
详细信息
ISBN:
(纸本)0769526438
In this paper, we propose a multi-paradigm and multi-grain parallel execution model based on SMP-Cluster, which integrates coarse grain, mid grain and fine grain parallelism. Multiple paradigms supported by our model include task parallel, data parallel, sequential execution, data pipeline and task-farming paradigm. It can be achieved by extending the OpenMP specification, and the extensions include directives for computing resource partition, data distribution and alignment, sequential execution and data pipeline, and functions for Master/Slave model in Macro-Task group. We also compare the performance of different implementations of three benchmark applications, using the same numerical algorithm but employing different programming approaches.
An architecture for a reconfigurable superscalar processor is described in which some of its execution units are implemented in reconfigurable hardware. The overall configuration of the processor is defined according ...
详细信息
ISBN:
(纸本)0769523129
An architecture for a reconfigurable superscalar processor is described in which some of its execution units are implemented in reconfigurable hardware. The overall configuration of the processor is defined according to how its reconfigurable execution units are configured. An efficient micro-architectural solution to configuration management is presented that effectively steers the current processor configuration toward a configuration that is well matched with the execution unit requirements of instructions being scheduled for execution. The approach first selects the best matched among four steering configurations based on the number and type of execution units required by the instructions. One of the steering configurations is dynamically defined as the current configuration;the other three are statically predefined. Once a steering configuration is selected, portions of it begin loading on corresponding reconfigurable execution units that are not busy. The active configuration of the processor is generally the overlap of two or more steering configurations.
This paper addresses the problem of creating software tools for visualizing the dynamic behavior of parallelapplications and systems. PARADISE (parallel Animated Debugging and Simulation Environment) approaches this ...
详细信息
A novel architecture to implement distributed arithmetic in VLSI is presented. This architecture comprises a serial-in random-out multiport memory and a multi-input adder. The design of a 1.25-μm CMOS convolution pro...
详细信息
ISBN:
(纸本)9517212402
A novel architecture to implement distributed arithmetic in VLSI is presented. This architecture comprises a serial-in random-out multiport memory and a multi-input adder. The design of a 1.25-μm CMOS convolution processor chip based on the architecture is reported. Issues in the development of chip architecture and design tools are discussed.
This paper presents a novel active architecture for building and deploying network services: HABA, Hyper Active Components Architecture. At the architectural level, HABA defines an active node whose functionalities ar...
详细信息
ISBN:
(纸本)0769516718
This paper presents a novel active architecture for building and deploying network services: HABA, Hyper Active Components Architecture. At the architectural level, HABA defines an active node whose functionalities are divided into the Node Operating System, the Execution Environment, and the Active applications. At the implementation level, HABA is a component-based platform where new components could be added and deployed, in order to dynamically modify network nodes behavior. applications can communicate across multi-tiered heterogeneous environments, and across Internet and Intranet structures. Interoperability with ANTS is achieved by "composition". At the deployment level HABA uses an active node approach, and offers a parallel controlled deployment mode and a sequential by request mode. In terms of security, HABA offers different security levels according to services profiles. Authentification of deployed code, and protection of nodes is achieved by the deployment of certificates on the nodes.
In this paper we propose a formal framework based on the Markov Chains to prove the performance of P2P protocols. Despite the proposal of several protocols for P2P networks, sometimes there is a lack of a formal demon...
详细信息
Artificial neural networks can solve complex problems such as time series prediction, handwritten pattern recognition or speech processing. Though software simulations are essential when one sets about to study a new ...
详细信息
The efficient realization of self-organizing systems based on 2D stencil code applications, like our developed Marching Pixel algorithms, is a great challenge. They are data-intensive and also computational-intensive,...
详细信息
In this paper we propose a price-based user-optimal job allocation scheme for grid systems whose nodes are connected by a communication network. The job allocation problem is formulated as a noncooperative game among ...
详细信息
This paper presents a programmable many-core platform containing 64 cores routed in a hierarchical network tor biomedical signal processingapplications. Individual core processors are based on a RISC architecture wit...
详细信息
ISBN:
(纸本)9781467349529;9781467349512
This paper presents a programmable many-core platform containing 64 cores routed in a hierarchical network tor biomedical signal processingapplications. Individual core processors are based on a RISC architecture with DSP enhancement blocks. Given the number of conditional program loops in DSP applications such as FFT, additional hardware blocks are added that operate in parallel to each core processor. The two blocks calculate the FFT input addresses and determine if a conditional loop is necessary. Pertorming these operations in parallel to the main processor greatly reduces the time to completion for a DSP application. Each processor is implemented in 65 nm CMOS using standard cell Iibraries. The 64-core platform occupies 19.51 mm(2) and runs at 1.18 GHz at 1 V. For demonstration, Electroencephalogram (EEG) seizure detection and analysis and uItrasound spectral doppler are mapped onto the cores. The seizure detection and analysis algorithm utilizes 60 processors and takes 890 ns to execute. Spectral doppler utilizes 29 processors and takes 715 ns to run.
暂无评论