This paper presents the design and implementation of a parallelization framework and OpenMP runtime support in Intel (R) C++ & Fortran compilers for exploiting nested parallelism in applications using OpenMP pragm...
详细信息
This paper presents the design and implementation of a parallelization framework and OpenMP runtime support in Intel (R) C++ & Fortran compilers for exploiting nested parallelism in applications using OpenMP pragmas or directives. We conduct the performance evaluation of two multimedia applications parallelized with OpenMP pragmas and compiled with the Intel C++ compiler on Hyper-Threading Technology (HT) enabled multiprocessor systems. The performance results show that the multithreaded code generated by the Intel compiler achieved a speedup up to 4.69 on 4 processors with HT enabled for five different input video sequences for the H.264 encoder workload, and a 1.28 speedup on an HT enabled single-CPU system and 1.99 speedup on an HT-enabled dual-CPU system for the audio visual speech recognition workload. The performance gain due to exploiting nested parallelism for leveraging Hyper-Threading Technology is up to 70% for two multimedia workloads under different multiprocessor system configurations. These results demonstrate that hyper-threading benefits can be achieved by exploiting nested parallelism through Intel compiler and runtime system support for OpenMP programs, (c) 2005 Elsevier B.V. All rights reserved.
作者:
Li, XHCao, JNHe, YXHong Kong Polytech Univ
Dept Comp Internet & Mobile Comp Lab Kowloon Hong Kong Peoples R China Wuhan Univ
State Key Lab Software Engn Data & Knowledge Engn Lab Wuhan 430072 Hubei Peoples R China Wuhan Univ
Sch Comp Parallel & Distributed Comp Lab Wuhan 430072 Hubei Peoples R China
Mobile agent technology has been applied to develop the solutions for various kinds of parallel and distributed computing problems. However, performance evaluation of mobile agent algorithms remains a difficult task, ...
详细信息
Mobile agent technology has been applied to develop the solutions for various kinds of parallel and distributed computing problems. However, performance evaluation of mobile agent algorithms remains a difficult task, mainly due to the characteristics of mobile agents such as distributed and asynchronous execution, autonomy and mobility. This paper proposes a general approach based on direct execution simulation for evaluating the performance of mobile agent algorithms by collecting and analyzing the information about the agents during their execution. We describe the proposed generic simulation model, named MADES, the architecture of a software environment based on MADES, and a prototype implementation. A mobile agent-based distributed load balancing algorithm has been used for experiments with the prototype.
作者:
Li, XHCao, JNHe, YXHong Kong Polytech Univ
Dept Comp Internet & Mobile Comp Lab Kowloon Hong Kong Peoples R China Wuhan Univ
State Key Lab Software Engn Data & Knowledge Engn Lab Wuhan 430072 Hubei Peoples R China Wuhan Univ
Sch Comp Parallel & Distributed Comp Lab Wuhan 430072 Hubei Peoples R China
Mobile agent technology has been applied to develop the solutions for various kinds of parallel and distributed computing problems. However, performance evaluation of mobile agent algorithms remains a difficult task, ...
详细信息
Mobile agent technology has been applied to develop the solutions for various kinds of parallel and distributed computing problems. However, performance evaluation of mobile agent algorithms remains a difficult task, mainly due to the characteristics of mobile agents such as distributed and asynchronous execution, autonomy and mobility. This paper proposes a general approach based on direct execution simulation for evaluating the performance of mobile agent algorithms by collecting and analyzing the information about the agents during their execution. We describe the proposed generic simulation model, named MADES, the architecture of a software environment based on MADES, and a prototype implementation. A mobile agent-based distributed load balancing algorithm has been used for experiments with the prototype.
Cluster computers have become the vehicle of choice to build high performance computing environments. To fully exploit the computing power of these environments, one must utilize high performance network and protocol ...
详细信息
Cluster computers have become the vehicle of choice to build high performance computing environments. To fully exploit the computing power of these environments, one must utilize high performance network and protocol technologies, since the communication patterns of parallel applications running on clusters require low latency and high throughput, not achievable by using off-the-shell network technologies. We have developed a technology to build high performance network equipment, called Maestro2. This paper describes the novel techniques used by Maestro2 to extract maximum performance from the physical medium and studies the impact of software-level parameters. The results obtained clearly show that Maestro2 is a promising technology, presenting very good results both in terms of latency and throughput. The results also show the large impact of software overhead in the overall performance of the system and validate the need for optimized communication libraries for high performance computing.
A high performance communication facility, called the GigaE PM, has been designed and implemented for parallel applications on clusters of computers using a Gigabit Ethernet. The GigaE PM provides not only a reliable ...
详细信息
A high performance communication facility, called the GigaE PM, has been designed and implemented for parallel applications on clusters of computers using a Gigabit Ethernet. The GigaE PM provides not only a reliable high bandwidth and low latency communication, but also supports existing network protocols such as TCP/IP. A reliable communication mechanism for a parallel application is implemented on the firmware on a NIC while existing network protocols are handled by an operating system kernel. A prototype system has been implemented using an Essential Communications Gigabit Ethernet card. The performance results show that a 58.3 mu s round trip time for a four byte user message, Emd 56.7 MBytes/sec bandwidth for a 1,468 byte message have been achieved on Intel Pentium II 400 MHz PCs. We have implemented MPICH-PM on top of the GigaE PM, and evaluated the NAS parallel benchmark performance. The results show that the IS class S performance on the GigaE PM is 1.8 times faster than that on TCP/IP.
This paper gives denotational models for three logic programming languages of progressive complexity, adopting the "logic programming without logic" approach. The first language is the control flow kernel of...
详细信息
ISBN:
(纸本)1581132654
This paper gives denotational models for three logic programming languages of progressive complexity, adopting the "logic programming without logic" approach. The first language is the control flow kernel of sequential Prolog, featuring sequential composition and backtracking. A committed-choice concurrent logic language with parallel composition (parallel AND) and don't care nondeterminism is studied next. The third language is the core of Warren's basic Andorra model, combining parallel composition and don't care nondeterminism with two forms of don't know nondeterminism (interpreted as sequential and parallel OR) and favoring deterministic over nondeterministic computation. We show that continuations are a valuable tool in the analysis and design of semantic models for both sequential and parallel logic programming. Instead of using mathematical notation, we use the functional programming language Haskell as a metalanguage for our denotational semantics, and employ monads in order to facilitate the transition from one language under study to another.
We present a C++ template run-time library, PROMOTER, and discuss run-time support for data-parallel applications. The PROMOTER run-time library provides a uniform framework for data-parallel applications, covering a ...
详细信息
ISBN:
(纸本)3540653872
We present a C++ template run-time library, PROMOTER, and discuss run-time support for data-parallel applications. The PROMOTER run-time library provides a uniform framework for data-parallel applications, covering a broad spectrum of granularity, regularity and dynamicity. It supports user-defined data structures ranging from dense to sparse arrays, regular to irregular index structures and data distributions. The object-oriented design and implementation of the PROMOTER run-time library not only provides an easy data-parallel programming environment, but also leads to an efficient implementation of data-parallel applications through object reuse and object specialization.
暂无评论