The availability of low-cost commodity multiprocessor machines change the nature of mainstream programming. This discipline is required to include small-scale, dual and quadruple processor machines, to remain competit...
详细信息
The availability of low-cost commodity multiprocessor machines change the nature of mainstream programming. This discipline is required to include small-scale, dual and quadruple processor machines, to remain competitive. These small-scale parallel systems require software engineering principles capable of encapsulating the complex parallel programming issues. This paper discusses a technique that provides a simple model for incorporating parallel programming in a scheduler. This model can dynamically adjust to single and small-scale multiple processor environments.
The data-flow graph (DFG) of a parallel application is frequently used to take scheduling decisions, based on the information that it models (dependencies among the tasks and volume of exchanged data). In the case of ...
详细信息
The data-flow graph (DFG) of a parallel application is frequently used to take scheduling decisions, based on the information that it models (dependencies among the tasks and volume of exchanged data). In the case of MPI-based programs, the DFG may be built at run-time by overloading the data exchange primitives. This article presents a library that enables the generation of the DFG of a MPI program, and its use to analyze the network contention on a test-application: the Linpack benchmark. It is the first step towards automatic mapping of a MPI program on a distributed architecture.
programming and design skills in parallel computing related to systems on chip (SOC) will become increasingly important since future SOCs will have multiple processors interconnected via on-chip networks (NOC). Unfort...
详细信息
programming and design skills in parallel computing related to systems on chip (SOC) will become increasingly important since future SOCs will have multiple processors interconnected via on-chip networks (NOC). Unfortunately there exist no easy-to-use tools for learning and experimenting with multiprocessor (MP)SOCs/NOCs, but one must use ad-hoc combinations of tools, methodologies and sample applications from very different sources. In this paper we introduce a parallel computing learning set (Parle) for configurable shared memory MPSOCs/NOCs and corresponding theoretical parallel random access machines (PRAM). The learning set consists of an experimental optimizing compiler for high-level parallel programming language e and assembler, linker, loader, simulator with a graphical user interface and statistical tools, and sample e/assembler code. Using the set, a student/designer can easily ivrite simple parallel programs, compile and load them into a configurable MPSOC/NOC platform, execute/debug them, gather statistics and explore the performance, utilization, and gate count estimations with different architectural parameters. The learning set runs on Mac OS X systems and is available for non-profit educational purposes.
Although deadlock is not completely avoidable in distributed and parallel programming, we here describe theory and practice of a system that allows us to limit deadlock to situations in which there are true circular d...
详细信息
Although deadlock is not completely avoidable in distributed and parallel programming, we here describe theory and practice of a system that allows us to limit deadlock to situations in which there are true circular data dependences or failure of processes that compute data needed at other processes. This allows us to guarantee absence of deadlock in SPMD computations absent process failure. Our system guarantees optimal ordering of communication statements. We gratefully acknowledge the support of the US National Science Foundation under Award CISE EIA 9810708 without which this work would not have been possible.
Summary form only given, as follows. This talk briefly reviews some of the most popular high-level and low-level parallel programming languages used for scientific computing. We will report our experiences of using th...
详细信息
ISBN:
(纸本)0769515738
Summary form only given, as follows. This talk briefly reviews some of the most popular high-level and low-level parallel programming languages used for scientific computing. We will report our experiences of using these languages in our research and compare the performance of several parallel scientific equation solvers implemented in different parallel languages. Major features and comparisons of these languages will be discussed. Some insights into when and where these languages should be used will be provided.
While the past research discussed several advantages of multiprocessor-system-on-a-chip (MPSOC) architectures from both area utilization and design verification perspectives over complex single core based systems, com...
详细信息
ISBN:
(纸本)1595930582
While the past research discussed several advantages of multiprocessor-system-on-a-chip (MPSOC) architectures from both area utilization and design verification perspectives over complex single core based systems, compilation issues for these architectures have relatively received less attention. programming MPSOCs can be challenging as several potentially conflicting issues such as data locality, parallelism and load balance across processors should be considered simultaneously. Most of the compilation techniques discussed in the literature for parallel architectures (not necessarily for MPSOCs) are loop based, i.e., they consider each loop nest in isolation. However, one key problem associated with such loop based techniques is that they fail to capture the interactions between the different loop nests in the application. This paper takes a more global approach to the problem and proposes a compiler-driven data locality optimization strategy in the context of embedded MPSOCs. An important characteristic of the proposed approach is that, in deciding the workloads of the processors (i.e., in parallelizing the application) it considers all the loop nests in the application simultaneously. The authors' experimental evaluation with eight embedded applications showed that the global scheme brings significant power/performance benefits over the conventional loop based scheme.
In order to improve the performance of applications on OpenMP/JIAJIA, we present a new abstraction, Array Relation Vector (ARV), to describe the relation between the data elements of two consistent shared arrays acces...
详细信息
In order to improve the performance of applications on OpenMP/JIAJIA, we present a new abstraction, Array Relation Vector (ARV), to describe the relation between the data elements of two consistent shared arrays accessed in one computation phase. Based on ARV, we use array grouping to eliminate the pseudo data distributing of small shared data and improve the page locality. Experimental results show that ARV-based array grouping can greatly improve the performance of applications with non-continuous data access and strict access affinity on OpenMP/JIAJIA cluster. For applications with small shared arrays, array grouping can improve the performance obviously when the processor number is small.
The purpose of this research was to construct an adaptive test on the computer. Adaptive testing is a new strategy of evaluation for computer-assisted learning and e-learning. Adaptive testing provides more efficient ...
详细信息
The purpose of this research was to construct an adaptive test on the computer. Adaptive testing is a new strategy of evaluation for computer-assisted learning and e-learning. Adaptive testing provides more efficient test administration and intelligent learning evaluation. It is expected to increase the accuracy of estimating the learners true ability with taking less appropriate selecting questions for individuals. Item response theory (IRT) is the main theoretical base to make tests adaptive and feasible. Adaptive testing requires high speed calculation to process the complicated IRT functions, which is fortunately the advantage of computers.
In this paper, TOPAS-a new parallel programming environment for distributed systems-is presented. TOPAS automatically analyzes data dependence among tasks and synchronizes data, which reduces the time needed for paral...
详细信息
In this paper, TOPAS-a new parallel programming environment for distributed systems-is presented. TOPAS automatically analyzes data dependence among tasks and synchronizes data, which reduces the time needed for parallel program developments. TOPAS also provides supports for scheduling, dynamic load balancing and fault tolerance. Experiments show simplicity and efficiency of parallel programming in TOPAS environment with fault-tolerant integration, which provides graceful performance degradation and quick reconfiguration time for application recovery.
暂无评论