An increasing number of real-time embedded applications present high computation requirements which need to be realized within strict time constraints. Simultaneously, architectures are becoming more and more heteroge...
详细信息
ISBN:
(纸本)9781479906567
An increasing number of real-time embedded applications present high computation requirements which need to be realized within strict time constraints. Simultaneously, architectures are becoming more and more heterogeneous, programming models are having difficulty in scaling or stepping outside of a particular domain, and programming such solutions requires detailed knowledge of the system and the skills of an experienced programmer. In this context, this paper advocates the transparent integration of a parallel and distributed execution framework, capable of meeting real-time constraints, based on OpenMP programming model, and using MPI as the distribution mechanism. the paper also introduces our modified implementation of GCC compiler, enabled to support such parallel and distributed computations, which is evaluated through a real implementation. this evaluation gives important hints, towards the development of the parallel/distributed fork-join framework for supporting real-time embedded applications.
We investigate the parallelism of simulated annealing for graph partitioning. A random sampling technique, in which disjoint partitions of the graph are distributed across the processors so that moves can be proposed ...
详细信息
We investigate the parallelism of simulated annealing for graph partitioning. A random sampling technique, in which disjoint partitions of the graph are distributed across the processors so that moves can be proposed and evaluated asynchronously in distinct processors, is proposed. Synchronizations among processors are not necessary until the equilibrium is reached at each processor. Such an approach may thus achieve a speedup that is approximately linear in the number of processors. Furthermore, since no interacting moves can occur in our strategy, it is obvious that our scheme is free of errors in cost evaluation.< >
Presents the design considerations, machine architecture, and parallel programming environment for the parallel machine Cenju-3, the first massively parallel processor product of NEC. Cenju-3 utilizes fast commodity R...
详细信息
Presents the design considerations, machine architecture, and parallel programming environment for the parallel machine Cenju-3, the first massively parallel processor product of NEC. Cenju-3 utilizes fast commodity RISC microprocessors (VR4400) in a distributed shared-memory arrangement and can be configured with up to 256 processing elements, connected through a multistage interconnection network. It results a compact and scalable parallel machine, with operating speeds ranging from 256 MFLOPS to a maximum of 12.8 GFLOPS. A hardware support for the interprocessor communication realizes efficient message/data transmissions among processors. Also, a programming environment (PCASE) has being developed, in order to fully utilize the Cenju-3. PCASE translates a sequential user program into its efficient parallel form with a minimum of user intervention. Preliminary results and future directions are also described.< >
Task scheduling plays an important role in task parallel computing platform. In this paper, we present DETS, a dynamic and elastic task scheduler that can support multiple parallel schemes. DETS works in a master-work...
详细信息
Task scheduling plays an important role in task parallel computing platform. In this paper, we present DETS, a dynamic and elastic task scheduler that can support multiple parallel schemes. DETS works in a master-worker manner and schedules tasks dynamically. In order to execute various types of applications elastically, it uses task pool from which workers pull tasks to execute. Workers are supervised to form a logical group which can scale up/down during runtime with available nodes and processors. DETS supports several types of parallel computing schemes, including embarrassingly parallel, MapReduce, Tree-based searching, DAG-based processing, etc. Exemplars are conducted and the results show DETS is efficient and practical.
For large planar adaptive arrays, the computational load and the terabyte data transmission are two challenging problems in the implementation of adaptive beamforming system. this paper considers the problem of design...
详细信息
For large planar adaptive arrays, the computational load and the terabyte data transmission are two challenging problems in the implementation of adaptive beamforming system. this paper considers the problem of designing an efficient parallel adaptive algorithm using the two dimensional(2D) planar array. this paper extends the efficient parallel adaptive beamforming algorithm based on LCMV to adaptive 2D sensor array processing. the proposed 2D parallel adaptive algorithm can be easily implemented in distributed-parallel-processing system due to its parallel structure. It can save the data transmission as well as computing cost.
Tree Adjoining Grammar (TAG) is a powerful grammatical formalism for large-scale natural language processing. However, the computational complexity of parsing algorithms for TAG is high. We introduce a new parallel TA...
详细信息
Tree Adjoining Grammar (TAG) is a powerful grammatical formalism for large-scale natural language processing. However, the computational complexity of parsing algorithms for TAG is high. We introduce a new parallel TAG parsing algorithm for MIMD hypercube multicomputers, using large-granularity grammar partitioning, asynchronous communication, and distributed termination detection. We describe our implementation on the nCUBE/2 parallel computer, and provide experimental results on both random and English grammars. Our algorithm delivers the best performance of any TAG parsing algorithm to date, yielding an almost two order-of-magnitude speedup and good efficiency on up to 256 processors. TAG parsing is a highly unstructured problem. Based on our experience developing a parallel TAG parser, we draw some general conclusions for solving other unstructured problems.< >
Introduces a technique for optimizing interprocessor communication in programs for distributed memory multiprocessors. Our basic approach is to combine messages withthe explicit goal of reducing the overall execution...
详细信息
Introduces a technique for optimizing interprocessor communication in programs for distributed memory multiprocessors. Our basic approach is to combine messages withthe explicit goal of reducing the overall execution time, taking into account direct and indirect dependencies among the concurrent units. We first establish that combining messages between a pair of isolated processors is not necessarily useful in reducing the overall execution time of the program because of complex interprocessor dependencies. the conditions under which message combining is profitable are then established. We then search for such conditions along chains of dependencies that exist across several processors and combine messages that satisfy these conditions.< >
Since in the large and complex robotic systems such as multiple robot systems, the real robot controllers are spatially distributed according to their physical structure, it is desirable to realize the hierarchical an...
详细信息
Since in the large and complex robotic systems such as multiple robot systems, the real robot controllers are spatially distributed according to their physical structure, it is desirable to realize the hierarchical and distributed control with a suitable parallelprocessing architecture. Hierarchical Petri net modeling is seemingly well suited for structured and predictable environments such as robotic manufacturing systems. the paper presents a discrete event net based distributed control architecture for multiple robot coordination. According to the parallelism among the robotic tasks in the system, the advantage of the parallel and distributed architecture of the control system will be taken to reach the high performance control capabilities.
this paper presents first an analysis of a parallel connection of one synchronous generator and one self-excited induction generator, each coupled to its dc machine as well as a capacitor bank with a resistive load co...
详细信息
this paper presents first an analysis of a parallel connection of one synchronous generator and one self-excited induction generator, each coupled to its dc machine as well as a capacitor bank with a resistive load connected to the system. In this stage, the voltage and frequency adjustments were done manually. In the second portion, a similar circuit was implemented considering a voltage loop control and a speed loop control to the synchronous generator. these controls kept on the system voltage and frequency stables. the induction generator supplies the part of active power required by load. the synchronous generator controls the frequency and supplies the reactive power and the additional active power. the experiment demonstrated the feasibility and stable operational performance for this alternative and creative parallel arrangement of generators.
暂无评论