the rapid development of web service technology brings up a number of crucial requirements for designing service computing runtime, such as supporting multiple message exchange patterns, switching among different tran...
详细信息
ISBN:
(纸本)9781457706783
the rapid development of web service technology brings up a number of crucial requirements for designing service computing runtime, such as supporting multiple message exchange patterns, switching among different transports, integrating various extended web service protocols and achieving robust performance under high concurrency. Based on staged event-driven architecture, we propose a novel architecture for an adaptive web-service-centric service computing runtime, named SEDA4SC. In SEDA4SC, the process of basic and extended web service protocols is divided into four primary event-driven stages to enable system independence and module isolation. Moreover, this architecture allows messages to be handled in two independent pipelines: the input pipeline and the output pipeline. Arbitrary message exchange patterns can be supported through a combination of the two pipelines. With SEDA4SC, we design and implement a service computing runtime system. the performance evaluation results show that our system exhibits robust performance under high concurrency.
Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM im...
详细信息
ISBN:
(纸本)0780393686
Transactional memory (TM) provides an easy-to-use and high-performance parallel programming model for the upcoming chip-multiprocessor systems. Several researchers have proposed alternative hardware and software TM implementations. However, the lack of transaction-based programs makes it difficult to understand the merits of each proposal and to tune future TM implementations to the common case behavior of real application. this work addresses this problem by analyzing the common case transactional behavior for 35 multithreaded programs from a wide range of application domains. We identify transactions within the source code by mapping existing primitives for parallelism and synchronization management to transaction boundaries. the analysis covers basic characteristics such as transaction length, distribution of read-set and write-set size, and the frequency of nesting and I/O operations. the measured characteristics provide key insights into the design of efficient TM systems for both nonblocking synchronization and speculative parallelization.
the solution of large and complex coupled electromechanical problems requires highperformancecomputing resources. In the last years the use of Graphic Processing Units (GPUs) has gained increasing popularity in scie...
详细信息
ISBN:
(纸本)9781467303057;9781467303064
the solution of large and complex coupled electromechanical problems requires highperformancecomputing resources. In the last years the use of Graphic Processing Units (GPUs) has gained increasing popularity in scientific computing because of their low cost and parallel architecture. In this paper the authors report the main results of the GPU approach to the parallelization of a research code for the electromagnetic launcher analysis. Programming a GPU - based environment poses a number of critical issues that have to be carefully addressed in order to fully exploit the potentiality of the system. Data have to be properly organized in order to fit the Single Instruction Multiple Data scheme;the data transfer between the host and the device, as well as the memory management of the GPU deserve accurate programming. Two examples of application of the parallelized code have been reported to show the performance improvements that can be obtained in the numerical analysis of both rail and induction launchers.
Machine learning is one of the hottest topics in IT industry as well as in academia. Some of the IT leaders and scientists believe that this is going to totally revolutionise the industry. this transformation is happe...
详细信息
ISBN:
(纸本)9788395918384
Machine learning is one of the hottest topics in IT industry as well as in academia. Some of the IT leaders and scientists believe that this is going to totally revolutionise the industry. this transformation is happening on both fronts, one is the application and software paradigm, the other is at the hardware and system level. At the same time, the high-performancecomputing segment is striving to achieve the level of Exascale performance. It is not debatable that to meet such level of performance and keep the cost of system and power consumption on reasonable level is not a trivial task. In this article, we try to look at a potential solution to these problems and discuss a new approach to building systems and software to meet these challenges and the growing needs of the computing power for HPC systems on the one hand, but also be ready for a new type of workload including Artificial Intelligence type of applications.
In recent years, the Deep Neural Network (DNN) has been successfully used in image classification. Most of existing DNN often need to learn a very large set of parameters, which require a huge amount of computational ...
详细信息
ISBN:
(纸本)9781538637906
In recent years, the Deep Neural Network (DNN) has been successfully used in image classification. Most of existing DNN often need to learn a very large set of parameters, which require a huge amount of computational resources and time to train these model parameters using the gradient descent and back-propagation procedure. To solve this issue, the PCANet has been developed for high efficient design and training of the DNN. Compared with traditional DNN, PCANet has simpler structure and better performance, which makes it attractive for hardware design. To overcome the limitations of PCANet and significantly improve its performance, we have proposed a novel model named Constrained high Dispersal Network (CHDNet) which is a variant of PCANet. In this paper, we implement the CHDNet on the Xilinx ZYNQ FPGA to ensure the instantaneity of the system with lower power than personal computer needed by taking advantage of the algorithmic parallelism and ZYNQ architecture. Our experimental results over two major datasets, the MNIST dataset for handwritten digits recognition, and the Extended Yale B dataset for face recognition, demonstrate that our model of implementation on FPGA is more than 15x faster than software implementation on PC (Intel i7-4720HQ, 2.6GHz).
Due to advancement in cloud computing technology, the research on the outsourced database has been spotlighted. Consequently, it is becoming more important to guarantee the correctness and completeness of query result...
详细信息
Dynamically Allocated Multi Queues (DAMQ) is an effective mechanism to achieve Virtual Channel (VC) based flow control with maximum buffer utilization in multi-core Network-on-Chip (NoC) systems. We present a new meth...
详细信息
Signatures have been proposed in Hardware Transactional Memory (HTM) to represent read and write sets of transactions and decouple transaction conflict detection from private caches. Generally, signatures are implemen...
详细信息
Discovery Net is an application layer for providing grid-based knowledge discovery services. these services allow scientists to create and manage complex knowledge discovery workflows that integrate data and analysis ...
详细信息
ISBN:
(纸本)0769516866
Discovery Net is an application layer for providing grid-based knowledge discovery services. these services allow scientists to create and manage complex knowledge discovery workflows that integrate data and analysis routines provided as remote services. they also allow scientists to store, share and execute these workflows as well as publish them as new services. Discovery Net provides a higher level of abstraction of the Grid for knowledge discovery activities, thus separating the end-users from resource management issues already handled by existing and emerging standards.
A description is given of the Parallel Unification Machine (PLUM), a Prolog processor that exploits fine-grain parallelism using multiple function units executing in parallel. In most cases the execution of bookkeepin...
详细信息
A description is given of the Parallel Unification Machine (PLUM), a Prolog processor that exploits fine-grain parallelism using multiple function units executing in parallel. In most cases the execution of bookkeeping instruction is almost completely overlapped by unification, and the performance of the processor is limited only by the available unification parallelism. Measurements from a register transfer level simulator of PLUM are presented. the results show that PLUM withthree unification units achieves an average speedup of approximately 3.4 over the Berkeley VLSI-PLM, which is usually regarded as the current highest-performance special-purpose, pipelined Prolog processor. Measurements that show the effects of multiple unification units and memory access time on performance are also presented.
暂无评论