Mobile computing is a rapidly emerging trend in distributedcomputing. The new mobile computing environment presents many challenges due to the mobile nature of the hosts. The authors present some fault-tolerant data ...
详细信息
Quality Critical Decentralised Applications (QCDApp) have high requirements for system performance and service quality, involve heterogeneous infrastructures (Clouds, Fogs, Edges and IoT), and rely on the trustworthy ...
详细信息
ISBN:
(纸本)9781728190747
Quality Critical Decentralised Applications (QCDApp) have high requirements for system performance and service quality, involve heterogeneous infrastructures (Clouds, Fogs, Edges and IoT), and rely on the trustworthy collaborations among participants of data sources and infrastructure providers to deliver their business value. The development of the QCDApp has to tackle the low-performance challenge of the current blockchain technologies due to the low collaboration efficiency among distributed peers for consensus. On the other hand, the resilience of the Cloud has enabled significant advances in software-defined storage, networking, infrastructure, and every technology;however, those rich programmabilities of infrastructure (in particular, the advances of new hardware accelerators in the infrastructures) can still not be effectively utilised for QCDApp due to lack of suitable architecture and programming model.
With the ever increasing demand for high quality 3D image processing on markets such as cinema and gaming, graphics processing units (GPUs) capabilities have shown tremendous advances. Although GPU-based cluster compu...
详细信息
ISBN:
(纸本)9780769536804
With the ever increasing demand for high quality 3D image processing on markets such as cinema and gaming, graphics processing units (GPUs) capabilities have shown tremendous advances. Although GPU-based cluster computing, which uses GPUs as the processing units, is one of the most promising high performance parallelcomputing platforms, currently there is no programming environment, interface or library designed to use these multiple computing resources to compute tasks in parallel. This paper proposes the CaravelaMPI, a new message passing interface targeted for GPU cluster computing, providing a unified and transparent interface to manage both communication and GPU execution. Experimental results show that the transparent interface of CaravelaMPI allows to efficiently program GPU-based clusters, not only decreasing the required programming effort but also increasing the performance of GPU-based cluster computing platforms.
This paper presents a new rapid thread replacement mechanism which is important in multithread technology. Analysis to the memory system indicates that the memory utilization decreases with the increase of cache hit r...
详细信息
ISBN:
(纸本)0818678763
This paper presents a new rapid thread replacement mechanism which is important in multithread technology. Analysis to the memory system indicates that the memory utilization decreases with the increase of cache hit ratio. The parallelism between thread computation and thread replacement is found by analyzing their working processes. Based on these, we advance a rapid multithread replacement mechanism which overlaps the thread replacement with thread computation. More especially, with finite hardware contexts, this mechanism can play the same role of infinite contexts by tolerating the replacement overhead. By modifing the general thread switching model, we bulid the thread replacement model and evaluate this mechanism in theory and experiment methods. At last, we discuss the hardware implementation and put forward the problems to be resolved in the future.
For the solutions of linear systems of equations with unsymmetric coefficient matrices, we have proposed an improved version of the quasi-minimal residual (IQMR) method [proceedings of The International Conference on ...
详细信息
For the solutions of linear systems of equations with unsymmetric coefficient matrices, we have proposed an improved version of the quasi-minimal residual (IQMR) method [proceedings of The International Conference on High Performance computing and Networking (HPCN-97) (1997);IEICE Trans Inform Syst E80-D (9) (1997) 919] by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. For the Lanczos process, stability is obtained by a coupled two-term procedure that generates Lanczos vectors scaled to unit length. The algorithm is derived so that all inner products and matrix-vector multiplications of a single iteration step are independent and the communication time required for inner product can be overlapped efficiently with computation time. In this paper, a theoretical model of computation and communications phases is presented to allow us to give a quantitative analysis of the parallel performance with a two-dimensional grid topology. The efficiency, speed-up, and runtime are expressed as functions of the number of processors scaled by the number of processors that gives the minimal runtime for the given problem size. The model not only evaluates effectively the improvements in performance due to communication reduction by overlapping, but also provides useful insight into the scalability of the IQMR method. The theoretical results on the performance are demonstrated by experimental timing results carried out on a massively paralleldistributed memory Parsytec system. (C) 2002 Published by Elsevier Science Ltd.
One of the most exciting and challenging research areas in our modern world involves the design and implementation of intelligent agents. In this paper we outline the specifications for applying recent advances in par...
详细信息
ISBN:
(纸本)1932415262
One of the most exciting and challenging research areas in our modern world involves the design and implementation of intelligent agents. In this paper we outline the specifications for applying recent advances in parallel processing and distributedcomputing technology to the design and analysis of parallel algorithms associated with the creation of the clusters necessary for the efficient operation of distributed intelligent agents. Intelligent agents, whose application spans the spectrum from internal combustion engines to remote robotic control, must be able to make decisions, act autonomously, and exhibit real time behavior in a potentially hostile environment. Recent advances in parallel processing including fault tolerance in programming languages provides an arena in which to explore and apply this technology to the construction of robotic systems that exhibit decision making capability and respond in real time.
Fast and efficient communication is one of the major design goals not only for parallel systems but also for clusters of workstations. The proposed model of the high performance communication device ATOLL (1) features...
详细信息
ISBN:
(纸本)0818678763
Fast and efficient communication is one of the major design goals not only for parallel systems but also for clusters of workstations. The proposed model of the high performance communication device ATOLL (1) features very low latency for the start of communication operations and reduces the software overhead for communication specific functions. To close the gap between off-the-shelf microprocessors and the communication system a highly sophisticated processor interface implements atomic start of communication, MMU support, and a flexible event scheduling scheme. The interconnectivity of ATOLL provided by four independent network ports combined with cut-through routing allows the configuration of a large variety of network topologies. A software transparent error correction mechanism significantly reduces the required protocol overhead. The presented simulation results promise high performance and low-latency communication.
On the point of that it is very difficult to keep load balancing among processors for the nonuniform loop in compile-time and it must be at the price of extra overhead to use dynamic methods, this paper has proposed a...
详细信息
ISBN:
(纸本)0818678763
On the point of that it is very difficult to keep load balancing among processors for the nonuniform loop in compile-time and it must be at the price of extra overhead to use dynamic methods, this paper has proposed an adaptive hybrid scheduling way, in which the processes of distribution of loop are divided into a few rounds and the block size in each round is determined adaptively according to the average overhead due to dynamic scheduling. Several experiment results have also exposed the effect of scheduling parameter, which could be selected by programmers according to the probability that a fetching processor may not perform an additional task fetching.
Although MPEG-1 Video is a promising and the most widely used moving picture compression standard, it requires a lot of computational resources to encode the moving pictures with a reasonable frame size and quality. I...
详细信息
Although MPEG-1 Video is a promising and the most widely used moving picture compression standard, it requires a lot of computational resources to encode the moving pictures with a reasonable frame size and quality. In this paper, we propose and implement an efficient parallelizing scheme of MPEG-1 Video encoding algorithm on an Ethernet-connected workstations which is the most widely available computing environment nowadays. In this parallelizing scheme, the slice-level, frame-level, and GOP (Group of Pictures)-level parallelisms are identified as the attractive parallelisms that can be exploited in Ethernet-connected workstations. Three efficient parallel implementation schemes considering the communication characteristics of Ethernet-connected workstations are also proposed and experimented. A series of experiments using thirty workstations shows that the MPEG-1 Video encoding time can be reduced in proportional to the number of workstations used in encoding computations, although there is a saturation point in the speedup graphs.
Computer numerical simulation is widely applied in engineering and social fields. It has shown great values in these fields. Small scale simulation applications can be processed on the traditional simulation computer....
详细信息
Computer numerical simulation is widely applied in engineering and social fields. It has shown great values in these fields. Small scale simulation applications can be processed on the traditional simulation computer. But with the size of problem increases the sequential processing of simulation can not meet the requirements. The dynamic real-time simulation and super real-time simulation require high performance simulation computer. In this paper we first analyse the structure of a classical simulation computer AD-100 which developed by ADI Inc., then a novel structure for simulation computer which adopts the MPP technology is proposed. At the end of this paper an experimental result is given to test the feasibility of parallel simulation processing.
暂无评论