Embedded systems require more and more computational power. Moreover, embedded applications are becoming data-dependent and their execution time depends on their input data. Only a dynamic global scheduling can balanc...
详细信息
ISBN:
(纸本)9780889868205
Embedded systems require more and more computational power. Moreover, embedded applications are becoming data-dependent and their execution time depends on their input data. Only a dynamic global scheduling can balance the workload on the computation resources and reach good performances. Thus, a solution to address this problem is to use many-core architectures with a dynamic and centralized control. In this article, we propose new on-line scheduling algorithms adapted to hierarchical many-core embedded systems. The proposed algorithms reduce communications between clusters in order to increase global performance. This paper highlights the good results of a scheduling algorithm named Static Clustering Dynamic Mapping. It consists in dividing the application graph offline and dynamically allocating each part on each cluster.
Given a multicomputer system of parallel processors connected in a torus network, the one-to-all personalized communication is to send from the root processor unique data to each of the other processors in the network...
详细信息
ISBN:
(纸本)9780889866379
Given a multicomputer system of parallel processors connected in a torus network, the one-to-all personalized communication is to send from the root processor unique data to each of the other processors in the network. Under the assumptions of same-size data to each processor, storeand-forward routing, and all-port processors, we formulate the one-to-all personalized communication problem as an optimization problem with the goal to minimize the total elapsed time (measured in the number of time steps) for all data to reach their respective destinations. We design an optimal algorithm based on partitioning the torus network into disjoint subnetworks. We also present a heuristic algorithm based on a greedy strategy. We implement the algorithms on two Linux clusters with Gigabit Ethernet torus connection, currently in use at the Jefferson National Lab and configured as a 2-dimensional 8 x 8 torus and a 3-dimensional 4 x 8 x 8 torus, respectively. We analyze the performance of the algorithms using data collected in experiments.
In two-sided channel routing on a VLSI chip it is often convenient to represent signal nets by trapezoids. In this representation the four corners of the trapezoids are the rightmost and left-most terminals on the upp...
详细信息
ISBN:
(纸本)9780889868205
In two-sided channel routing on a VLSI chip it is often convenient to represent signal nets by trapezoids. In this representation the four corners of the trapezoids are the rightmost and left-most terminals on the upper side and lower side of the channel respectively. The maximum set of nonintersecting trapezoids is of particular interest since corresponding signal nets can be safely assigned to the same layer in the channel routing. Although a sequential algorithm to compute maximum independent set of trapezoids is known, the sweep line approach employed by the sequential algorithm is incremental in nature and does not yield itself to a parallel solution. In this paper we use three new ideas to find the maximum independent set in parallel. First, for every comparable pair of trapezoids we introduce a new unique 'in-between' trapezoid. Next, the trapezoids are mapped to their canonical box representation, and finally, a new parallel operation called 'corner stitching' is applied on boxes to construct chains of boxes which define the independent set. The algorithm presented here is deterministic and is designed to run on a Concurrent Read Concurrent Write parallel random access machine(CRCW-PRAM). The algorithm runs in O(log n) time with O(n2) processors.
We have developed a network (called TPNET) which is adaptable for any parallel processing systems. It consists of several core processors and a router. A process element in a parallel processing system is a processor ...
详细信息
ISBN:
(纸本)9780889868205
We have developed a network (called TPNET) which is adaptable for any parallel processing systems. It consists of several core processors and a router. A process element in a parallel processing system is a processor called TPCORE2, which has been developed by the authors' group. Since this core processor can execute full set of the transputer instruction set, we can describe a software system using the parallel processing language occam. Occam is based on theoretically a model called Communicating Sequential Processes (CSP). If a parallel system can be described in occam language, and work fine, it will be regarded as free from any deadlocks or livelocks which will be intrinsically hidden in a parallel system. We can construct simply a secure parallel processing system in this way. Each processor can be connected to a router, and we can achieve a dynamic configuration of the network topology by controlling the router. The basic communication protocol in TPNET is IEEE 1355. An assured and efficient network can be constructed despite the structural simplicity of the protocol. With characteristics discussed above and with an efficient interrupt processing system in TPCORE2, we propose this TPNET as a basic framework for high performance embedded systems used widely in various industrial fields.
Most modem parallel computers are clusters using Myrinet or Ethernet communication networks. Several studies have been published comparing the performance of these two networks for parallelcomputing, however these fo...
详细信息
ISBN:
(纸本)9780889866379
Most modem parallel computers are clusters using Myrinet or Ethernet communication networks. Several studies have been published comparing the performance of these two networks for parallelcomputing, however these focus on average performance, and do not address the distributions of communication times, which can have long tails due to contention effects. In the case of Ethernet with TCP, retransmit timeouts (RTOs) can also occur. Slow communication events may have significant impact, particularly for applications requiring frequent synchronization, where the performance is determined by the slowest process. We have analysed the distributions of communication times for standard MPI routines on Ethernet with TCP and Myrinet with GM communications networks on the same cluster, and studied the scalability of the distributions as the number of communicating processes is increased, and the effect of RTOs for Ethernet with TCP.
In this paper we propose a new load balancing algorithm for the grid computing service. The proposed load balancing is based on the CPU speed of the workers in the grid system. We developed a simulation model using NS...
详细信息
ISBN:
(纸本)9780889866379
In this paper we propose a new load balancing algorithm for the grid computing service. The proposed load balancing is based on the CPU speed of the workers in the grid system. We developed a simulation model using NS2 to evaluate the performance of our load balancing algorithm. Our simulation results show an asymptotically optimal behaviour of our load balancing algorithms.
Active and passive replication are powerful techniques to improve the quality of multimedia streaming. Most systems follow either the active or the passive approach. A well known example for active replication are Con...
详细信息
ISBN:
(纸本)9780889866379
Active and passive replication are powerful techniques to improve the quality of multimedia streaming. Most systems follow either the active or the passive approach. A well known example for active replication are Content Distribution networks [8] that replicate data to predefined static locations. In contrast to that, P2P file sharing networks [2, 1] use passive replication where identical content is usually provided by different peers. We suggest a system that combines both techniques using Proxy Affinity, Request Affinity and Replication Affinity considering user preferences, user behaviour, hardware resources and networks capabilities.
In a Spiking Neural networks (SNN), spike emissions are sparsely and irregularly distributed both in time and in the network architecture. Since a current feature of SNNs is a low average activity, efficient implement...
详细信息
ISBN:
(纸本)088986568X
In a Spiking Neural networks (SNN), spike emissions are sparsely and irregularly distributed both in time and in the network architecture. Since a current feature of SNNs is a low average activity, efficient implementations of SNNs are usually based on an Event-Driven Simulation (EDS). On the other hand, simulations of large scale neural networks can take advantage of distributing the neurons on a set of processors (either workstation cluster or parallel computer). This article presents a large scale SNN simulation framework able to gather the benefits of EDS and parallelcomputing. Two levels of parallelism are combined: distributed mapping of the neural topology, at the network level, and local multithreaded allocation of resources for simultaneous processing of events, at the neuron level. Based on the causality of events, a distributed solution is proposed for solving the complex problem of scheduling without synchronization barrier.
In this paper, the design and implementation of a recently developed clustering algorithm NNCA [1], Nearest Neighhour Clustering Algorithm, is proposed in conjunction with a Fast K Nearest Neighbour (FKNN) strategy fo...
详细信息
ISBN:
(纸本)9780889866379
In this paper, the design and implementation of a recently developed clustering algorithm NNCA [1], Nearest Neighhour Clustering Algorithm, is proposed in conjunction with a Fast K Nearest Neighbour (FKNN) strategy for further reduction in processing time. The parallel algorithm (PNNCA) has the ability to cluster pixels of retinal images into those belonging to blood vessels and others not belonging to blood vessels in a reasonable time.
The paper presents an approach to QoS management in distributed service oriented systems. We study the approach by the example of the Cassandra Framework developed at Philips Research. Cassandra is a distributed video...
详细信息
ISBN:
(纸本)9780889866379
The paper presents an approach to QoS management in distributed service oriented systems. We study the approach by the example of the Cassandra Framework developed at Philips Research. Cassandra is a distributed video/audio streaming and analysis platform consisting of a collection of loosely-coupled services, which can be easily combined to build distributed applications. To manage QoS of the system, two QoS attributes are selected: availability and performance. We tackle these issues in a service oriented fashion, making the system tolerant to service failures and adaptable to varying requirements of the applications.
暂无评论